Re: Internal error xfs_trans_cancel

From: Dave Chinner
Date: Thu Jun 02 2016 - 02:35:49 EST


On Thu, Jun 02, 2016 at 07:23:24AM +0200, Daniel Wagner wrote:
> > posix03 and posix04 just emit error messages:
> >
> > posix04 -n 40 -l 100
> > posix04: invalid option -- 'l'
> > posix04: Usage: posix04 [-i iterations] [-n nr_children] [-s] <filename>
> > .....
>
> I screwed that this up. I have patched my version of lockperf to make
> all test using the same options names. Though forgot to send those
> patches. Will do now.
>
> In this case you can use use '-i' instead of '-l'.
>
> > So I changed them to run "-i $l" instead, and that has a somewhat
> > undesired effect:
> >
> > static void
> > kill_children()
> > {
> > siginfo_t infop;
> >
> > signal(SIGINT, SIG_IGN);
> >>>>>> kill(0, SIGINT);
> > while (waitid(P_ALL, 0, &infop, WEXITED) != -1);
> > }
> >
> > Yeah, it sends a SIGINT to everything with a process group id. It
> > kills the parent shell:
>
> Ah that rings a bell. I tuned the parameters so that I did not run into
> this problem. I'll do patch for this one. It's pretty annoying.
>
> > $ ./run-lockperf-tests.sh /mnt/scratch/
> > pid 9597's current affinity list: 0-15
> > pid 9597's new affinity list: 0,4,8,12
> > sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent
> > posix01 -n 8 -l 100
> > posix02 -n 8 -l 100
> > posix03 -n 8 -i 100
> >
> > $
> >
> > So, I've just removed those tests from your script. I'll see if I
> > have any luck with reproducing the problem now.
>
> I was able to reproduce it again with the same steps.

Hmmm, Ok. I've been running the lockperf test and kernel builds all
day on a filesystem that is identical in shape and size to yours
(i.e. xfs_info output is the same) but I haven't reproduced it yet.
Is it possible to get a metadump image of your filesystem to see if
I can reproduce it on that?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx