Re: [netfs/cifs - Linux 6.14] loop on file cat + file copy when files are on CIFS share

From: Nicolas Baranger
Date: Thu Apr 24 2025 - 04:40:30 EST

Next message: Pierre-Eric Pelloux-Prayer: "[PATCH v9 00/10] Improve gpu_scheduler trace events + UAPI"
Previous message: Geert Uytterhoeven: "Re: [PATCH v6 2/3] drm/st7571-i2c: add support for Sitronix ST7571 LCD controller"
In reply to: Nicolas Baranger: "Re: [netfs/cifs - Linux 6.14] loop on file cat + file copy when files are on CIFS share"
Next in thread: Paulo Alcantara: "Re: [netfs/cifs - Linux 6.14] loop on file cat + file copy when files are on CIFS share"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Resending mail in plain text version, sorry !]

Hi Paolo

Thanks again for help and sorry for this new mail but I think it could be relevant

In fact, I think there is somethings wrong:

After a remount, I sucessfully get the good buffers size values in /proc/mounts (those defined in /etc/fstab).

grep cifs /proc/mounts
//10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=*****,domain=*****,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,rsize=4194304,wsize=4194304,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1 0 0

uname -r
6.13.8.1-ast-nba0-amd64

But here is what I constat: a 'dd' with a block size smaller than 65536 is working fine:
LANG=en_US.UTF-8

dd if=/dev/urandom of=/mnt/fbx/FBX-24T/dd.test3 bs=65536 status=progress conv=notrunc oflag=direct count=128
128+0 records in
128+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.100398 s, 83.6 MB/s

But a 'dd' with a block size bigger than 65536 is not working:
LANG=en_US.UTF-8

dd if=/dev/urandom of=/mnt/fbx/FBX-24T/dd.test3 bs=65537 status=progress conv=notrunc oflag=direct count=128
dd: error writing '/mnt/fbx/FBX-24T/dd.test3'
dd: closing output file '/mnt/fbx/FBX-24T/dd.test3': Invalid argument

And kernel report:
Apr 24 10:01:37 14RV-SERVER.14rv.lan kernel: CIFS: VFS: \\10.0.10.100 Error -32 sending data on socket to server

If I let systemd option x-systemd.automount mount the share it configure /proc/mount with rsize=65536,wsize=65536 and I'm able to send datas whatever is the size of each packet of datas in the transfer stream.
Example:

grep cifs /proc/mounts
//10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=*****,domain=*****,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,rsize=65536,wsize=65536,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1 0 0

dd if=/dev/urandom of=/mnt/fbx/FBX-24T/dd.test3 bs=64M status=progress conv=notrunc oflag=direct count=128
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 42 s, 203 MB/s
128+0 records in
128+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 42.2399 s, 203 MB/s

To conclude, if I force an fstab value bigger than 65536 to be concidered and used (visible in /proc/mounts), transfer failed if I don't stream the transfer in packets of maximum 65536 bytes and if I let systemd configure rsize and wsize at 65536, I can stream the transfer in blocks of all size and specially of bigger size (*1024 in the example)

Let me know if you need further testing

Kind regards
Nicolas

Le 2025-04-24 09:40, Nicolas Baranger a écrit :

Hi Paolo

Thanks again for help.

I'm sorry, I made a mistake in my answer yesterday:

After a lot of testing, the mounts buffers values: rsize=65536, wsize=65536, bsize=16777216,...

The actual values in /etc/fstab are:
rsize=4194304,wsize=4194304,bsize=16777216

But negociated values in /proc/mounts are:
rsize=65536,wsize=65536,bsize=16777216

And don't know if it's related but I have:
grep -i maxbuf /proc/fs/cifs/DebugData
CIFSMaxBufSize: 16384

I've just force a manual 'mount -o remount' and now I have in /proc/mounts the good values (SMB version is 3.1.1).
Where does this behavior comes from ?

After some search, it appears that when the CIFS share is mounted by systemd option x-systemd.automount (for example doing 'ls' in the mount point directory), negociated values are:
rsize=65536,wsize=65536,bsize=16777216
If I umount / remount manually, the negociated values are those defined in /etc/fstab !

Don't know if it's a normal behavior but it is a source of errors / mistake and makes troubleshooting performance issues harder

Kind regards
Nicolas

Le 2025-04-23 18:28, Nicolas Baranger a écrit :

Hi Paolo

Thanks for answer, all explanations and help

I'm happy you found those 2 bugs and starting to patch them.
Reading your answer, I want to remember that I already found a bug in cifs DIO starting from Linux 6.10 (when cifs statring to use netfs to do its IO) and it was fixed by David and Christoph
full story here: https://lore.kernel.org/all/14271ed82a5be7fcc5ceea5f68a10bbd@xxxxxxxxxxxxx/T/

I've noticed that you disabled caching with 'cache=none', is there any
particular reason for that?
Yes, it's related with the precedent use case describes in the other bug:
For backuping servers, I've got some KSMBD cifs share on which there are some 4TB+ sparses files (back-files) which are LUKS + BTRFS formatted.
The cifs share is mounted on servers and each server mount its own back-file as a block device and make its backup inside this crypted disk file
Due to performance issues, it is required that the disk files are using 4KB block and are mounted in servers using losetup DIO option (+ 4K block size options)
When I use something else than 'cache=none', sometimes the BTRFS filesystem on the back file get corrupted and I also need to mount the BTRFS filesystem with 'space_cache=v2' to avoid filesystem corruption

Have you also set rsize, wsize and bsize mount options? If so, why?
After a lot of testing, the mounts buffers values: rsize=65536, wsize=65536, bsize=16777216, are the one which provide the best performances with no corruptions on the back-file filesystem and with these options a ~2TB backup is possible in few hours during timeframe ~1 -> ~5 AM each night

For me it's important that kernel async DIO on netfs continue to work as it's used by all my production backup system (transfer speed ratio compared with and without DIO is between 10 to 25)

I will try the patch "[PATCH] netfs: Fix setting of transferred bytes with short DIO reads", thanks

Let me know if you need further explanations,

Kind regards
Nicolas Baranger

Le 2025-04-22 01:45, Paulo Alcantara a écrit :

Nicolas Baranger <nicolas.baranger@xxxxxx> writes:

If you need more traces or details on (both?) issues :

- 1) infinite loop issue during 'cat' or 'copy' since Linux 6.14.0

- 2) (don't know if it's related) the very high number of several bytes
TCP packets transmitted in SMB transaction (more than a hundred) for a 5
bytes file transfert under Linux 6.13.8
According to your mount options and network traces, cat(1) is attempting
to read 16M from 'toto' file, in which case netfslib will create 256
subrequests to handle 64K (rsize=65536) reads from 'toto' file.

The first 64K read at offset 0 succeeds and server returns 5 bytes, the
client then sets NETFS_SREQ_HIT_EOF to indicate that this subrequest hit
the EOF. The next subrequests will still be processed by netfslib and
sent to the server, but they all fail with STATUS_END_OF_FILE.

So, the problem is with short DIO reads in netfslib that are not being
handled correctly. It is returning a fixed number of bytes read to
every read(2) call in your cat command, 16711680 bytes which is the
offset of last subrequest. This will make cat(1) retry forever as
netfslib is failing to return the correct number of bytes read,
including EOF.

While testing a potential fix, I also found other problems with DIO in
cifs.ko, so I'm working with Dave to get the proper fixes for both
netfslib and cifs.ko.

I've noticed that you disabled caching with 'cache=none', is there any
particular reason for that?

Have you also set rsize, wsize and bsize mount options? If so, why?

If you want to keep 'cache=none', then a possible workaround for you
would be making rsize and wsize always greater than bsize. The default
values (rsize=4194304,wsize=4194304,bsize=1048576) would do it.

Next message: Pierre-Eric Pelloux-Prayer: "[PATCH v9 00/10] Improve gpu_scheduler trace events + UAPI"
Previous message: Geert Uytterhoeven: "Re: [PATCH v6 2/3] drm/st7571-i2c: add support for Sitronix ST7571 LCD controller"
In reply to: Nicolas Baranger: "Re: [netfs/cifs - Linux 6.14] loop on file cat + file copy when files are on CIFS share"
Next in thread: Paulo Alcantara: "Re: [netfs/cifs - Linux 6.14] loop on file cat + file copy when files are on CIFS share"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]