Re: linux-next: Tree for Jun 28 [ BISECTED: rsyslog/imklog: High CPUusage ]

From: Sedat Dilek
Date: Sun Jun 30 2013 - 07:50:39 EST


On Sat, Jun 29, 2013 at 11:44 PM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> On Fri, Jun 28, 2013 at 9:33 AM, Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> wrote:
>> Hi all,
>>
>> Changes since 20130627:
>>
>> This tree produces the following warning when built for many (all?)
>> configs (it has been fixed in the drm tree):
>>
>> drivers/video/Kconfig:42:error: recursive dependency detected!
>> drivers/video/Kconfig:42: symbol FB is selected by DRM_KMS_HELPER
>> drivers/gpu/drm/Kconfig:29: symbol DRM_KMS_HELPER is selected by DRM_OMAP
>> drivers/gpu/drm/omapdrm/Kconfig:2: symbol DRM_OMAP depends on FB_OMAP2
>> drivers/video/omap2/omapfb/Kconfig:1: symbol FB_OMAP2 depends on FB
>>
>> The arm-soc tree gained a conflict against the mfd tree.
>>
>> I applied a patch to the akpm tree to remove a lot of warnings.
>>
>
> [ I kept the CC list of the culprit commit (sorry if I forgot anyone) ]
>
> Hi,
>
> [ SYMPTOM ]
>
> Last Friday I wanted to listen to some music-videos on my
> next-20130628 kernel and noticed that my battery got sucked empty very
> fast.
> I could see with htop that 2 (of 4) rsyslogd processes ran on 100%.
>
> NOTE: next-20130627 (Thursday's Linux-Next) did NOT show these symptoms.
>
> [ "CSI dileks" CHECKS THE LOGS ]
>
> While investigating the logs I saw this:
>
> [ /var/log/syslog ]
> Jun 29 15:25:34 fambox rsyslogd: [origin software="rsyslogd"
> swVersion="5.8.6" x-pid="900" x-info="http://www.rsyslog.com";] exiting
> on signal 15.
> Jun 29 15:25:39 fambox kernel: imklog 5.8.6, log source = /proc/kmsg started.
> Jun 29 15:25:39 fambox rsyslogd: [origin software="rsyslogd"
> swVersion="5.8.6" x-pid="4689" x-info="http://www.rsyslog.com";] start
> Jun 29 15:25:39 fambox rsyslogd: rsyslogd's groupid changed to 103
> Jun 29 15:25:39 fambox rsyslogd: rsyslogd's userid changed to 101
> Jun 29 15:25:39 fambox rsyslogd-2039: Could not open output pipe
> '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
> Jun 29 15:25:39 fambox kernel:
> Jun 29 15:26:09 fambox kernel: last message repeated 30176 times
> Jun 29 15:27:09 fambox kernel: last message repeated 59603 times
> Jun 29 15:28:09 fambox kernel: last message repeated 60081 times
> Jun 29 15:29:09 fambox kernel: last message repeated 60148 times
>
> [ 1ST TRYOUT ]
>
> [1] recommends to use a more modern rsyslog-version, so I upgraded
> from (5.8.6-1ubuntu8.4) to (7.4.1-0ubuntu1ppa1) from an unofficial
> 3rd-party software-archive [2] for Ubuntu/precise.
> This did not help - still 100% CPU usage observed.
>
> [ 2ND TRYOUT: WORKAROUND ]
>
> Following [1], I temporarily disabled imklog rsyslog-module:
>
> [ /etc/rsyslog.conf ]
> ...
> -$ModLoad imklog # provides kernel logging support (previously done by rklogd)
> +# XXX: Workaround: Do NOT load imklog as it causes 100% CPU workload
> by flooding logs. -dileks
> +#$ModLoad imklog # provides kernel logging support (previously done
> by rklogd)
>
> Yeah, this resulted in a "normal" system - no high CPU loads anymore
> (in idle-mode).
>
> [ CURIOSITY KILLED THE CAT ]
>
> I cannot say if this here is related...
>
> [ /var/log/syslog ]
> fambox rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try
> http://www.rsyslog.com/e/2039 ]
>
> ...it looks like I have to create a '/dev/xconsole' manually.
>
> root# mknod -m 640 /dev/xconsole p
>
> With the "BROKEN" kernel this IIRC did not help, the good one did not
> show xconsole-related stuff in the logs.
> But I might remember wrong.
>
> See also:
>
> [ /etc/rsyslog.d/50-default.conf ]
> ...
> # The named pipe /dev/xconsole is for the `xconsole' utility. To use it,
> # you must invoke `xconsole' with the `-file' option:
> #
> # $ xconsole -file /dev/xconsole [...]
> #
> # NOTE: adjust the list below, or you'll go crazy if you have a reasonably
> # busy site..
> #
> daemon.*;mail.*;\
> news.err;\
> *.=debug;*.=info;\
> *.=notice;*.=warn |/dev/xconsole
> - EOF -
>
> Running manually 'xconsole -file /dev/xconsole' was not successful
> (could not open... device) with the bad kernel.
> Might be good to comment that xconsole block?
>
> [ GIT-BISECT SESSION ]
>
> As this all did not show me what caused the problem I started a
> git-bisect session.
>
> This revealed the following culprit commit:
>
> commit bb1f30cb7d3ba21098f0ee7e0382160ba2599a43
> "wait: introduce wait_event_common(wq, condition, state, timeout)"
>
> This commit was 1st introduced with Friday's next-20130628!
>
> NOTE: See also git-bisect-log.txt and git-bisect-visualize.txt for the results.
>
> Reverting it successfully with two other related patches results in a
> "normal" system again.
>
> See attached patch lists:
> Revert "fix warnings from ?: operator in wait.h"
> Revert "wait: introduce prepare_to_wait_event()"
> Revert "wait: introduce wait_event_common(wq, condition, state, timeout)"
>
> [ GIVEAWAYS ]
>
> For the sake of completeness I have attached my kernel-config.
>
> Oleg, can you look at that?
> If you need more inputs lemme know.
>
> Thanks in advance.
>
> Regards,
> - Sedat -
>
> [1] http://www.gossamer-threads.com/lists/rsyslog/users/8087
> [2] https://launchpad.net/~tmortensen/+archive/rsyslogv7

Hi Andrew, Hi Stephen,

there seems to be a small hunk in wait.h to be different compared with
the wait.h-patches in next-20130628:

diff --git a/include/linux/wait.h b/include/linux/wait.h
index f3b793d..0a5573e 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -226,12 +226,13 @@ wait_queue_head_t *bit_waitqueue(void *, int);
if (condition) { \
__ret = __wait_no_timeout(tout); \
if (!__ret) { \
- __ret = (tout); \
+ __ret = tout; \
if (!__ret) \
__ret = 1; \
} \
- } else \
+ } else { \
__ret = __wait_event_common(wq, condition, state, tout);\
+ } \
__ret; \
})

I have attached the triple patchset from mmots whereas 3-3 as v2.

The high CPU workload still remains.

Regards,
- Sedat -

Attachment: 1-3-wait-introduce-wait_event_commonwq-condition-state-timeout.patch
Description: Binary data

Attachment: 2-3-wait-introduce-wait_event_commonwq-condition-state-timeout-fix.patch
Description: Binary data

Attachment: 3-3-wait-introduce-prepare_to_wait_event-v2.patch
Description: Binary data