NFS won't finish

From: bob
Date: Fri Apr 12 2019 - 16:04:11 EST


Kernel: Linux freedom 5.1.0-rc4 #1 SMP Sun Apr 7 18:50:31 MDT 2019 x86_64 x86_64 x86_64 GNU/Linux

Sending large files to NAS box via NFS failed (It looks like files are getting transferred, but NFS

gets stuck/won't finish.

Note:
1. NFS works ok with kernel 5.0.0

2. To test NFS, (using the5.1.0-rc4 kernel), I connected to my NAS box and moved a small log file
ÂÂ (less that 20kB)Â and it moved from computer to NAS without problem.
ÂÂ I then deleted the small log file from the NAS box, again without problem.
ÂÂ (connected via NFS).
ÂÂ I then copied a database archive (24929 bytes) from NAS box to computer,
 again without fail. But today I moved 3 large tarball/archives from
ÂÂ computer to NAS box (TX:ÂÂ cum:ÂÂ 8.50GB), and it seems
ÂÂ to have moved OK, but NFS does not want to complete the transfer.
ÂÂ I can't get a directory listing from where I started the copy, and
ÂÂ the process queue shows:

rootÂÂÂÂÂ 6176Â 0.0Â 0.0Â 20824Â 2504 pts/5ÂÂÂ D+ÂÂ 10:28ÂÂ 0:11 cp site.Apr-05-2019.tgz site.Apr-12-2019.tgz site.Mar-29-2019.tgz nas/website/BACKUP/2019.04.12

DMESG:

perf: interrupt took too long (2694 > 2500), lowering kernel.perf_event_max_sample_rate to 74250
[ 1644.507579] perf: interrupt took too long (3369 > 3367), lowering kernel.perf_event_max_sample_rate to 59250
[ 2349.205399] FS-Cache: Loaded
[ 2349.348842] FS-Cache: Netfs 'nfs' registered for caching
[ 2349.652305] NFS: Registering the id_resolver key type
[ 2349.652316] Key type id_resolver registered
[ 2349.652317] Key type id_legacy registered
[ 3595.680685] nfs: server 192.168.20.114 not responding, still trying
[ 3595.680692] nfs: server 192.168.20.114 not responding, still trying
[ 3595.680695] nfs: server 192.168.20.114 not responding, still trying
[ 3626.400412] INFO: task cp:6176 blocked for more than 120 seconds.
[ 3626.400418]ÂÂÂÂÂÂ Tainted: GÂÂÂÂÂÂÂÂÂ IÂÂÂÂÂÂ 5.1.0-rc4 #1
[ 3626.400419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3626.400422] cpÂÂÂÂÂÂÂÂÂÂÂÂÂ DÂÂÂ 0Â 6176ÂÂ 5890 0x00000000
[ 3626.400425] Call Trace:
[ 3626.400434]Â ? __schedule+0x2a1/0x870
[ 3626.400436]Â schedule+0x28/0x70
[ 3626.400440]Â io_schedule+0x12/0x40
[ 3626.400443]Â wait_on_page_bit_common+0x10a/0x280
[ 3626.400447]Â ? file_check_and_advance_wb_err+0xc0/0xc0
[ 3626.400450]Â __filemap_fdatawait_range+0xd8/0x140
[ 3626.400453]Â filemap_write_and_wait_range+0x3b/0x70
[ 3626.400471]Â nfs_file_fsync+0x44/0x1f0 [nfs]
[ 3626.400474]Â filp_close+0x2a/0x70
[ 3626.400475]Â __x64_sys_close+0x1e/0x50
[ 3626.400477]Â do_syscall_64+0x4f/0x100
[ 3626.400479]Â entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3626.400481] RIP: 0033:0x7f6be90958d4
[ 3626.400485] Code: Bad RIP value.
[ 3626.400486] RSP: 002b:00007ffe07ec4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 3626.400487] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6be90958d4
[ 3626.400488] RDX: 0000000000020000 RSI: 000055b82b141000 RDI: 0000000000000004
[ 3626.400488] RBP: 00007ffe07ec4770 R08: 0000000000020000 R09: 0000000000000001
[ 3626.400489] R10: 0000000000020000 R11: 0000000000000246 R12: 00007ffe07ec4970
[ 3626.400490] R13: 0000000000000000 R14: 00007ffe07ec4900 R15: 00007ffe07ec52db
[ 3747.231331] INFO: task cp:6176 blocked for more than 241 seconds.
[ 3747.231337]ÂÂÂÂÂÂ Tainted: GÂÂÂÂÂÂÂÂÂ IÂÂÂÂÂÂ 5.1.0-rc4 #1
[ 3747.231339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

According to NTOP, it looks like data was transferred:

TX:ÂÂÂÂÂÂÂÂÂÂÂÂ cum:ÂÂ 8.50GBÂÂ peak:ÂÂÂ 208bÂÂ rates: 0bÂÂÂÂÂ 0bÂÂÂÂ 18b

My Hardware:

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
ÂÂÂÂÂÂÂ Subsystem: ASUSTeK Computer Inc. Motherboard
ÂÂÂÂÂÂÂ Flags: bus master, fast devsel, latency 0, IRQ 26
ÂÂÂÂÂÂÂ Memory at fb9fc000 (64-bit, non-prefetchable) [size=16K]
ÂÂÂÂÂÂÂ I/O ports at b800 [size=256]
ÂÂÂÂÂÂÂ Expansion ROM at fb9c0000 [disabled] [size=128K]
ÂÂÂÂÂÂÂ Capabilities: <access denied>
ÂÂÂÂÂÂÂ Kernel driver in use: sky2
ÂÂÂÂÂÂÂ Kernel modules: sky2

06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
ÂÂÂÂÂÂÂ Subsystem: ASUSTeK Computer Inc. Motherboard
ÂÂÂÂÂÂÂ Flags: bus master, fast devsel, latency 0, IRQ 25
ÂÂÂÂÂÂÂ Memory at fbbfc000 (64-bit, non-prefetchable) [size=16K]
ÂÂÂÂÂÂÂ I/O ports at d800 [size=256]
ÂÂÂÂÂÂÂ Expansion ROM at fbbc0000 [disabled] [size=128K]
ÂÂÂÂÂÂÂ Capabilities: <access denied>
ÂÂÂÂÂÂÂ Kernel driver in use: sky2
ÂÂÂÂÂÂÂ Kernel modules: sky2

Drivers:

lsmod | grep nfs yields:
nfsv3ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 53248Â 1
nfsv4ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 692224Â 0
nfsÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 315392Â 3 nfsv4,nfsv3
fscacheÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 380928Â 2 nfsv4,nfs
nfsdÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 413696Â 13
auth_rpcgssÂÂÂÂÂÂÂÂÂÂ 102400Â 2 nfsd,rpcsec_gss_krb5
nfs_aclÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 16384Â 2 nfsd,nfsv3
lockdÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 106496Â 3 nfsd,nfsv3,nfs
graceÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 16384Â 2 nfsd,lockd
sunrpcÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 438272Â 28 nfsd,nfsv4,auth_rpcgss,lockd,nfsv3,rpcsec_gss_krb5,nfs_acl,nfs


I can ftp to the NAS box and see that two of the three files completed:-

rw-r--r--ÂÂÂ 1 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 5027646357 Apr 12 10:31 site.Apr-05-2019.tgz
-rw-r--r--ÂÂÂ 1 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 5152017050 Apr 12 10:34 site.Apr-12-2019.tgz


Sorry for the long post, but I wanted to be complete. Perhaps its a corner case. I'm fairly certain the

NAS machine is using the ext4 filesystem and I don't think its a filesystem size limit.

Thanks in advance, please email me directly as I am not on the list,

Bob