Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

From: Sedat Dilek
Date: Thu Apr 28 2011 - 11:28:32 EST


On Thu, Apr 28, 2011 at 3:30 PM, Mike Galbraith <efault@xxxxxx> wrote:
> On Thu, 2011-04-28 at 03:26 -0700, Paul E. McKenney wrote:
>> On Thu, Apr 28, 2011 at 11:45:03AM +0200, Sedat Dilek wrote:
>> > Hi,
>> >
>> > not sure if my problem from linux-2.6-rcu.git#sedat.2011.04.23a is
>> > related to the issue here.
>> >
>> > Just FYI:
>> > I am here on a Pentium-M (uniprocessor aka UP) and still unsure if I
>> > have the correct (optimal?) kernel-configs set.
>> >
>> > Paul gave me a script to collect RCU data and I enhanced it with
>> > collecting SCHED data.
>> >
>> > In the above mentionned GIT branch I applied these two extra commits
>> > (0001 requested by Paul and 0002 proposed by Thomas):
>> >
>> > patches/0001-Revert-rcu-restrict-TREE_RCU-to-SMP-builds-with-PREE.patch
>> > patches/0002-sched-Add-warning-when-RT-throttling-is-activated.patch
>> >
>> > Furthermore, I have added my kernel-config file, scripts, patches and
>> > logs (also output of 'cat /proc/cpuinfo').
>> >
>> > Hope this helps the experts to narrow down the problem.
>>
>> Yow!!!
>>
>> Now this one might well be able to hit the 950 millisecond limit.
>> There are no fewer than 1,314,958 RCU callbacks queued up at the end of
>> the test. ÂAnd RCU has indeed noticed this and cranked up the number
>> of callbacks to be handled by each invocation of rcu_do_batch() to
>> 2,147,483,647. ÂAnd only 15 seconds earlier, there were zero callbacks
>> queued and the rcu_do_batch() limit was at the default of 10 callbacks
>> per invocation.
>
> Yeah, yow. ÂOnce the RT throttle hit, it stuck.
>
> Â.clock             : 1386824.201768
> Â.rt_nr_running         : 2
> Â.rt_throttled         Â: 1
> Â.rt_time            : 950.132427
> Â.rt_runtime          Â: 950.000000
> Â Â Â Â Â rcuc0 Â Â 7 Â Â Â Â 0.034118 Â Â 10857 Â Â98 Â Â Â Â 0.034118 Â Â Â1472.309646 Â Â Â Â 0.000000 /
> FF Â Â1 Â Â Â1 R Â ÂR 0 [rcuc0]
> Â.clock             : 1402450.997994
> Â.rt_nr_running         : 2
> Â.rt_throttled         Â: 1
> Â.rt_time            : 950.132427
> Â.rt_runtime          Â: 950.000000
> Â Â Â Â Â rcuc0 Â Â 7 Â Â Â Â 0.034118 Â Â 10857 Â Â98 Â Â Â Â 0.034118 Â Â Â1472.309646 Â Â Â Â 0.000000 /
> FF Â Â1 Â Â Â1 R Â ÂR 0 [rcuc0]
>
> ...
>
> Â.clock             : 2707432.862374
> Â.rt_nr_running         : 2
> Â.rt_throttled         Â: 1
> Â.rt_time            : 950.132427
> Â.rt_runtime          Â: 950.000000
> Â Â Â Â Â rcuc0 Â Â 7 Â Â Â Â 0.034118 Â Â 10857 Â Â98 Â Â Â Â 0.034118 Â Â Â1472.309646 Â Â Â Â 0.000000 /
> FF Â Â1 Â Â Â1 R Â ÂR 0 [rcuc0]
> Â.clock             : 2722572.958381
> Â.rt_nr_running         : 2
> Â.rt_throttled         Â: 1
> Â.rt_time            : 950.132427
> Â.rt_runtime          Â: 950.000000
> Â Â Â Â Â rcuc0 Â Â 7 Â Â Â Â 0.034118 Â Â 10857 Â Â98 Â Â Â Â 0.034118 Â Â Â1472.309646 Â Â Â Â 0.000000 /
> FF Â Â1 Â Â Â1 R Â ÂR 0 [rcuc0]
>
>

Hi,

OK, I tried with the patch proposed by Thomas (0003):

patches/0001-Revert-rcu-restrict-TREE_RCU-to-SMP-builds-with-PREE.patch
patches/0002-sched-Add-warning-when-RT-throttling-is-activated.patch
patches/0003-sched-Remove-skip_clock_update-check.patch

>From the very beginning it looked as the system is "stable" due to:

.rt_nr_running : 0
.rt_throttled : 0

This changed when I started a simple tar-job to save my kernel
build-dir to an external USB-hdd.
From...

.rt_nr_running : 1
.rt_throttled : 1

...To:

.rt_nr_running : 2
.rt_throttled : 1

Unfortunately, reducing all activities to a minimum load, did not
change from last known RT throttling state.

Just noticed rt_time exceeds the value of 950 first time here:

.rt_nr_running : 1
.rt_throttled : 1
.rt_time : 950.005460

Full data attchached as tarball.

- Sedat -

P.S.: Excerpt from
collectdebugfs-v2_2.6.39-rc3-rcutree-sedat.2011.04.23a+.log (0:0 ->
1:1 -> 2:1)

--
rt_rq[0]:
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 888.893877
.rt_runtime : 950.000000

runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
R cat 2652 115108.993460 1 120
115108.993460 1.147986 0.000000 /
--
rt_rq[0]:
.rt_nr_running : 1
.rt_throttled : 1
.rt_time : 950.005460
.rt_runtime : 950.000000

runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
rcuc0 7 0.000000 56869 98
0.000000 981.385605 0.000000 /
--
rt_rq[0]:
.rt_nr_running : 2
.rt_throttled : 1
.rt_time : 950.005460
.rt_runtime : 950.000000

runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
rcuc0 7 0.000000 56869 98
0.000000 981.385605 0.000000 /
--

Attachment: from-dileks-2.tar.xz
Description: Binary data

Attachment: from-dileks-2.tar.xz.sha256sum
Description: Binary data