Re: [PATCH] char/mem: only use {read,write}_iter, not the old {read,write} functions

From: Al Viro
Date: Sat May 21 2022 - 13:52:06 EST


On Fri, May 20, 2022 at 03:44:04PM +0000, Al Viro wrote:
> On Fri, May 20, 2022 at 09:32:34AM -0600, Jens Axboe wrote:
>
> > Didn't look closer, but I'm assuming this is _mostly_ tied to needing to
> > init 48 bytes of kiocb for each one. There might be ways to embed a
> > sync_kiocb inside the kiocb for the bits we need there, at least that
> > could get us down to 32 bytes.
>
> My bet would be on iocb_flags() (and kiocb_set_rw_flags()) tests and
> pointer-chasing, actually. I'd been sick on and off since early November,
> trying to dig myself from under the piles right now. Christoph's
> patches in that area are somewhere in the pile ;-/

FWIW, I can't find them, assuming I'm not misremembering in the first place.

iocb_flags() is an atrocity, indeed. Look what happens in new_sync_write():

[edx holds file->f_flags]
movl %edx, %eax
shrl $6, %eax
andl $16, %eax // eax = (edx >> 6) & 16; maps O_APPEND to IOCB_APPEND
movl %eax, %ecx
orl $131072, %ecx
testb $64, %dh
cmovne %ecx, %eax // eax = edh & 0x4000 ? eax | 0x20000;
testb $16, %dh // if (edx & O_DSYNC)
jne .L175 // goto L175; branch not taken
movq 216(%rdi), %rcx // rcx = file->f_mapping; // fetch
movq (%rcx), %rcx // rcx = rcx->host; // fetch
movq 40(%rcx), %rsi // rsi = rcx->i_sb; // fetch
testb $16, 80(%rsi) // if (!(rsi->s_flags & 0x10)) // fetch
je .L198 // goto L198; // branch likely taken
L175:
orl $2, %eax // eax |= IOCB_DSYNC;
L176:
movl %eax, %ecx
orl $4, %ecx
andl $1048576, %edx
cmovne %ecx, %eax // eax = edx & __O_SYNC ? eax | IOCB_SYNC : eax;

movq %gs:current_task, %rdx # rdx = current;
movq 2192(%rdx), %rcx # rcx = rdx->io_contenxt;
movl $16388, %edx # edx = IOPRIO_DEFAULT;
testq %rcx, %rcx # if (!rcx)
je .L178 # goto L178;
movzwl 12(%rcx), %edx # edx = rcx->ioprio;
L178:
...
fill iov_iter
call ->write_iter() and bugger off

L198:
testb $1, 12(%rcx) // if (rcx->i_flags & S_SYNC)
je .L176
jmp .L175

IOCB_DSYNC part is bloody awful. To add insult to injury, we end up
doing it in new_sync_read() as well ;-/ Its validity is dubious, BTW -
we only get away with that since O_DSYNC is ignored for all in-tree
character devices and since for block ones ->f_mapping->host->i_sb
ends up pointing to blockdev_superblock, which won't be mounted
sync.

I've sent a patch into old thread ([RFC] what to do with IOCB_DSYNC?);
it's completely untested, though.