Re: Regression: NULL pointer dereference after NFS_V4_2_READ_PLUS (commit 7fd461c47)

From: Olga Kornievskaia
Date: Thu Feb 16 2023 - 12:40:33 EST


On Tue, Feb 14, 2023 at 6:08 AM Krzysztof Kozlowski
<krzysztof.kozlowski@xxxxxxxxxx> wrote:
>
> On 12/02/2023 15:05, Anna Schumaker wrote:
> >>> From ac2d6c501dbcdb306480edaee625b5496f1fb4f5 Mon Sep 17 00:00:00 2001
> >>> From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> >>> Date: Fri, 10 Feb 2023 15:50:22 -0500
> >>> Subject: [PATCH] NFSv4.2: Rework scratch handling for READ_PLUS
> >>>
> >>
> >> Patch is corrupted - maybe mail program reformatted it when sending:
> >>
> >> Applying: NFSv4.2: Rework scratch handling for READ_PLUS
> >> error: corrupt patch at line 12
> >> Patch failed at 0001 NFSv4.2: Rework scratch handling for READ_PLUS
> >
> > That's weird. I wasn't expecting gmail to reformat the patch but I
> > guess it did. I've added it as an attachment so that shouldn't happen
> > again.
>
> Still null ptr (built on 420b2d4 with your patch):
>
> [ 144.690844] mmiocpy from xdr_inline_decode (net/sunrpc/xdr.c:1419 net/sunrpc/xdr.c:1454)
> [ 144.695950] xdr_inline_decode from nfs4_xdr_dec_read_plus (fs/nfs/nfs42xdr.c:1063 fs/nfs/nfs42xdr.c:1147 fs/nfs/nfs42xdr.c:1360 fs/nfs/nfs42xdr.c:1341)
> [ 144.702452] nfs4_xdr_dec_read_plus from call_decode (net/sunrpc/clnt.c:2595)
> [ 144.708429] call_decode from __rpc_execute (include/asm-generic/bitops/generic-non-atomic.h:128 net/sunrpc/sched.c:954)
> [ 144.713538] __rpc_execute from rpc_async_schedule (include/linux/sched/mm.h:336 net/sunrpc/sched.c:1035)
> [ 144.719170] rpc_async_schedule from process_one_work (include/linux/jump_label.h:260 include/linux/jump_label.h:270 include/trace/events/workqueue.h:108 kernel/workqueue.c:2294)
> [ 144.725238] process_one_work from worker_thread (include/linux/list.h:292 kernel/workqueue.c:2437)
> [ 144.730782] worker_thread from kthread (kernel/kthread.c:378)
> [ 144.735547] kthread from ret_from_fork (arch/arm/kernel/entry-common.S:149)

My 2cents...

>From what I can tell read_plus only calls xdr_inline_decode() for
"numbers" (eof, #segs, type, offset, length) and we always expect that
__xdr_inline_decode() would return a a non-null "p". But if
__xdr_inline_decode() returned null, the code would call
xdr_copy_to_scratch() which would ultimately call the memcpy().
xdr_copy_to_scrach() expects the scratch buffer to be setup. However,
as I said, for the decode of numbers we don't set up the scratch
space. Which then leads to this oops. How, the reason the
__xdr_inline_decode() would return a null pointer if it ran out it's
provided xdr space which was provided #decode_read_plus_maxsz.

#define NFS42_READ_PLUS_DATA_SEGMENT_SIZE \
(1 /* data_content4 */ + \
2 /* data_info4.di_offset */ + \
1 /* data_info4.di_length */)
#define decode_read_plus_maxsz (op_decode_hdr_maxsz + \
1 /* rpr_eof */ + \
1 /* rpr_contents count */ + \
NFS42_READ_PLUS_DATA_SEGMENT_SIZE)

while a data segment needs (2) + (1), a hole segment needs to be (2) +
(2) (as both offset and lengths are longs.

while a "correct" maxsz is important for page alignment for reads, it
might means we are not providing enough space for when there are hole
segments? It seems weird that for the spec we have hole length and
data length of different types (long and int).

>
>
>
> Best regards,
> Krzysztof
>