Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

From: Mike Galbraith
Date: Wed Feb 26 2014 - 22:26:50 EST


On Thu, 2014-02-27 at 00:52 +0000, Ken Moffat wrote:
> Hi,
>
> Short summary : on 3.13.5, rm -rf of an application source
> directory on an ext4 filesystem sometimes takes forever (probably
> isn't going anywhere), with one CPU pegged at all-but 100% utilization.
>
> I've nearly finished building a new system from source, to check
> various desktop packages in linuxfromscratch. On this build, much of
> it is things I don't normally use and I needed to upgrade my
> buildscripts, so most of it was built in chroot using 3.10.32. But
> late last night I booted the new system using 3.13.5 to finish the
> build. This morning I discovered that rm -rf for the icedtea source
> directory was still running, and had taken over 5 hours of CPU time
> (one CPU seemd to be running at close to 100%, the others had dropped
> to their slowest frequency). That script was running as root (yeah,
> but it's a new system) and it looks as if /etc/passwd~ had got
> trashed, because I could no longer su or login. Not sure if that is
> related, at this stage it might just be a side-effect of my scripts.
>
> Booted another system, chrooted, fixed up passwords. Started
> again after commenting out icedtea - I hadn't intended to build
> what was an old version, I'd just forgotten it was in this script -
> that's why I do things in userspace, not the kernel :-(
>
> Continued with remaining packages, but a couple of hours later I
> saw a similar "one CPU at 100%, rm -rf GConf source taking forever"
> problem. Dumped all the processes with Alt-SysRQ-T [ huge log ] but
> at that point 'rm' was merely 'ready' so I doubt there is anything
> useful to see in the log.
>
> Built 3.13.4, booted to that. So far, everything looks good - but
> I'm now building the _current_ version of icedtea, so if this isn't
> a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.
>
> Meanwhile, any suggestions about how I can debug this if I hit it
> again, please ?

I would start with strace to see if a task is looping in userspace, then
move on to perf top -g -p <pid> (or perf record/report) to peek at what
it's up to in the kernel. Once you have the where, trace_printk() is
the best thing since sliced bread (which ranks just below printk()).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/