Re: 2.6.35-rc4-git3: Reported regressions from 2.6.34

From: Linus Torvalds
Date: Thu Jul 08 2010 - 21:40:27 EST


On Thu, Jul 8, 2010 at 4:33 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>
> Unresolved regressions
> ----------------------
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16353
> Subject         : 2.6.35 regression
> Submitter       : Zeev Tarantov <zeev.tarantov@xxxxxxxxx>
> Date            : 2010-07-05 13:04 (4 days old)
> Message-ID      : <loom.20100705T144459-919@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127836002702522&w=2

This is a gcc-4.5 issue. Whether it's also something that we should
change in the kernel is unclear, but at least as of now, the rule is
that you cannot compile the kernel with gcc-4.5. No idea whether the
compiler is just entirely broken, or whether it's just that it
triggers something iffy by being overly clever.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16346
> Subject         : 2.6.35-rc3-git8 - include/linux/fdtable.h:88 invoked rcu_dereference_check() without protection!
> Submitter       : Miles Lane <miles.lane@xxxxxxxxx>
> Date            : 2010-07-04 22:04 (5 days old)
> Message-ID      : <AANLkTinof0k28rk4rMr66aubxcRL2rFa5ZEArj1lqD3o@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127828107815930&w=2

I'm not entirely sure if these RCU proving things should count as regressions.

Sure, the option to enable RCU proving is new, but the things it
reports about generally are not new - and they are usually not even
bugs in the sense that they necessarily cause any real problems.

That particular one is in the single-thread optimizated case for fget_light, ie

if (likely((atomic_read(&files->count) == 1))) {
file = fcheck_files(files, fd);

where I think it should be entirely safe in all ways without any
locking. So I think it's a false positive too.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16334
> Subject         : reiserfs locking (v2)
> Submitter       : Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> Date            : 2010-07-02 9:34 (7 days old)
> Message-ID      : <20100702093451.GA3973@xxxxxxxxxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127806306303590&w=2

Frederic? Al? I assume this is some late fallout from the BKL removal
ages ago.. It's the old filldir-vs-mmap crud, but normally it should
be impossible to trigger because the inode for a directory should
never be mmap'able, so we should never have the same i_mutex lock used
for both mmap and for filldir protection.

We saw some of that oddity long ago, I wonder if it's lockdep being
confused about some inodes.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16333
> Subject         : iwl3945: HARDWARE GONE??
> Submitter       : Priit Laes <plaes@xxxxxxxxx>
> Date            : 2010-07-02 16:02 (7 days old)
> Message-ID      : <1278086575.2889.8.camel@chi>
> References      : http://marc.info/?l=linux-kernel&m=127808659705983&w=2

This either got fixed, or will be practically impossible to debug. The
reporter ends up being unable to reproduce the issue.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16332
> Subject         : Kernel crashes in tty code (tty_open)
> Submitter       : werner@xxxxxxxxxxxxx
> Date            : 2010-07-02 3:34 (7 days old)
> Message-ID      : <1278041650.12788@xxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127804167511930&w=2

This seems to be due to CONFIG_MRST (Moorestown).

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16330
> Subject         : Dynamic Debug broken on 2.6.35-rc3?
> Submitter       : Thomas Renninger <trenn@xxxxxxx>
> Date            : 2010-07-01 15:44 (8 days old)
> Message-ID      : <201007011744.19564.trenn@xxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127799907218877&w=2

There's a suggested patch in

http://marc.info/?l=linux-kernel&m=127862524404291&w=2

but no reply to it yet.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16329
> Subject         : 2.6.35-rc3: Load average climbing to 3+ with no apparent reason: CPU 98% idle, with hardly no I/O
> Submitter       : Török Edwin <edwintorok@xxxxxxxxx>
> Date            : 2010-07-01 7:40 (8 days old)
> Message-ID      : <20100701104022.404410d6@debian>
> References      : http://marc.info/?l=linux-kernel&m=127797005030536&w=2

This seems to be partly a confusion about what "load average" is. It's
not a CPU load, it's a system load average, and disk-wait processes
count towards it. He has some problem with his CD-ROM, and it sounds
like it might be hardware on the verge of going bad.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16324
> Subject         : Oops while running fs_racer test on a POWER6 box against latest git
> Submitter       : divya <dipraksh@xxxxxxxxxxxxxxxxxx>
> Date            : 2010-06-30 11:34 (9 days old)
> Message-ID      : <4C2B28F3.7000006@xxxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127789697303061&w=2

I wonder if this is the writeback problem. That POWER crash dump is
unreadable, so it's hard to tell, but the load in question makes that
at least likely.

If so, it should hopefully be fixed in today's git (commit
83ba7b071f30f7c01f72518ad72d5cd203c27502 and friends).

> Bug-entry : http://bugzilla.kernel.org/show_bug.cgi?id=16323
> Subject         : 2.6.35-rc3-git4 - kernel/sched.c:616 invoked rcu_dereference_check() without protection!
> Submitter       : Miles Lane <miles.lane@xxxxxxxxx>
> Date            : 2010-07-01 12:21 (8 days old)
> Message-ID      : <AANLkTini6hz2LFeZi8CMUmY3xw1MU7NxmyesuxZ4oCdo@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127798693125541&w=2

See earlier about these being marked as regressions, but it should be
fixed by commit dc61b1d6 ("sched: Fix PROVE_RCU vs cpu_cgroup").

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16322
> Subject         : WARNING: at /arch/x86/include/asm/processor.h:1005 read_measured_perf_ctrs+0x5a/0x70()
> Submitter       : boris64 <bugzilla.kernel.org@xxxxxxxxxxx>
> Date            : 2010-07-01 13:54 (8 days old)
> Handled-By      : H. Peter Anvin <hpa@xxxxxxxxx>

Magic. Strange and dark magic.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16311
> Subject         : [REGRESSION][SUSPEND] 2.6.35-rcX won't suspend Lenovo W500 laptop
> Submitter       : Shawn Starr <shawn.starr@xxxxxxxxxx>
> Date            : 2010-06-28 0:45 (11 days old)
> Message-ID      : <201006272045.17004.shawn.starr@xxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127768633705286&w=2

I think this might be usefully bisected. Shawn?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16309
> Subject         : 2.6.35-rc3 oops trying to suspend.
> Submitter       : Andrew Hendry <andrew.hendry@xxxxxxxxx>
> Date            : 2010-06-27 12:40 (12 days old)
> Message-ID      : <AANLkTinUH2p33-AWxOVDrLsNkn9rgEVrlwn5mfK7P8NH@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127764249926781&w=2

I'm pretty sure this was fixed by Nick in commit 57439f878afa ("fs:
fix superblock iteration race").

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16307
> Subject         : i915 in kernel 2.6.35-rc3, high number of wakeups
> Submitter       : Enrico Bandiello <enban@xxxxxxxxxxxx>
> Date            : 2010-06-26 16:57 (13 days old)
> Message-ID      : <4C26317A.5070309@xxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127757403404259&w=2

I don't think anybody noticed this one. Jesse?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16304
> Subject         : i915 - high number of wakeups
> Submitter       : Enrico Bandiello <enban@xxxxxxxxxxxx>
> Date            : 2010-06-27 09:52 (12 days old)

Duplicate of that 16307 one.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16284
> Subject         : Hitting WARN_ON in hw_breakpoint code
> Submitter       : Paul Mackerras <paulus@xxxxxxxxx>
> Date            : 2010-06-23 12:57 (16 days old)
> Message-ID      : <20100623125740.GA3368@xxxxxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127729789113432&w=2

This has "I have a fix, will post it very soon." in the thread from
Frederic, but I'm not seeing anything else. Frederic?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16265
> Subject         : Why is kslowd accumulating so much CPU time?
> Submitter       : Theodore Ts'o <tytso@xxxxxxx>
> Date            : 2010-06-09 18:36 (30 days old)
> First-Bad-Commit: http://git.kernel.org/linus/fbf81762e385d3d45acad057b654d56972acf58c
> Message-ID      : <E1OMQ88-0002a1-Gb@xxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127610857819033&w=4

Dave, Jesse?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16234
> Subject         : [2.6.35-rc3] reboot mutex 'bug'...
> Submitter       : Daniel J Blueman <daniel.blueman@xxxxxxxxx>
> Date            : 2010-06-14 15:16 (25 days old)
> Message-ID      : <AANLkTimDcTnyEPmt2ZcCM1UWtn4AYKotiqyjobJApkO7@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127652861118933&w=2

Ok, this is definitely harmless. Whether we should silence the warning
somehow is a separate question.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16230
> Subject         : inconsistent IN-HARDIRQ-W -> HARDIRQ-ON-W usage: fasync, 2.6.35-rc3
> Submitter       : Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>
> Date            : 2010-06-13 9:53 (26 days old)
> Message-ID      : <20100613095305.GA13231@xxxxxxxxxxxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127642282208277&w=2

Fixed by commit f4985dc714d7.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16228
> Subject         : BUG/boot failure on Dell Precision T3500 (pci/ahci_stop_engine)
> Submitter       : Brian Bloniarz <phunge0@xxxxxxxxxxx>
> Date            : 2010-06-16 17:57 (23 days old)
> Handled-By      : Bjorn Helgaas <bjorn.helgaas@xxxxxx>

This has a butt-ugly suggested patch that certainly won't be applied.
I saw the thread, but lost sight of it. Jesse, did that end up with
some resolution?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16221
> Subject         : 2.6.35-rc2-git5 -- [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
> Submitter       : Miles Lane <miles.lane@xxxxxxxxx>
> Date            : 2010-06-11 20:31 (28 days old)
> Message-ID      : <AANLkTim0jVRyqkwlGOcrg_XTvUQwcBYfWJX-aRzkkrLG@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127628828119623&w=2

I dunno. Old, and apparently seen by two people. Dave?

Might be helped by bisection.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16205
> Subject         : acpi: freeing invalid memtype bf799000-bf79a000
> Submitter       : Marcin Slusarz <marcin.slusarz@xxxxxxxxx>
> Date            : 2010-06-09 20:09 (30 days old)
> Message-ID      : <20100609200910.GA2876@xxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127611427029914&w=2
>                  http://marc.info/?l=linux-kernel&m=127688398513862&w=2

This should be fixed by commit b945d6b2554d ("rbtree: Undo augmented
trees performance damage and regression").

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16199
> Subject         : 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!
> Submitter       : Miles Lane <miles.lane@xxxxxxxxx>
> Date            : 2010-06-07 18:14 (32 days old)
> Message-ID      : <AANLkTin2pPqOUx--9fIX3BH3e-cU6oCRufijcx_4ozx5@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127593447812015&w=2

Another RCU proving thing. And this one looks the same as the 16323
one above, and fixed by the same commit as that one.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16197
> Subject         : [BUG on 2.6.35-rc2] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:11.0/0000:02:03.0/slot'
> Submitter       : Ryan Wang <openspace.wang@xxxxxxxxx>
> Date            : 2010-06-07 0:23 (32 days old)
> Message-ID      : <AANLkTincwMZPnYW3S4uz4k2GOn52RpgBIBRfzyD010Yo@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127587022219378&w=2

These should all be gone. See commit 3be434f0244ee by Jesse ('Revert
"PCI: create function symlinks in /sys/bus/pci/slots/N/"').

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16187
> Subject         : Carrier detection failed in dhcpcd when link is up
> Submitter       : Christian Casteyde <casteyde.christian@xxxxxxx>
> Date            : 2010-06-12 15:15 (27 days old)
> First-Bad-Commit: http://git.kernel.org/linus/10708f37ae729baba9b67bd134c3720709d4ae62
> Handled-By      : Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

David? This bisects to a networking commit. Doesn't look sensible, but
what do I know?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16184
> Subject         : Container, X86-64, i386, iptables rule
> Submitter       : Jean-Marc Pigeon <jmp@xxxxxxx>
> Date            : 2010-06-12 04:17 (27 days old)
> Handled-By      : Patrick McHardy <kaber@xxxxxxxxx>

Patrick, Davem? Ping?

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16179
> Subject         : 2.6.35-rc2 completely hosed on intel gfx?
> Submitter       : Norbert Preining <preining@xxxxxxxx>
> Date            : 2010-06-06 11:55 (33 days old)
> Message-ID      : <20100606115534.GA9399@xxxxxxxxxxxxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127582534931581&w=2

Hmm. That one is the vt.c bug coupled with another problem, which in
turn got opened as a separate bugzilla entry:

http://bugzilla.kernel.org/show_bug.cgi?id=16252

which in turn then got closed. I dunno.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16175
> Subject         : 2.6.35-rc1 system oom, many processes killed but memory not free
> Submitter       : andrew hendry <andrew.hendry@xxxxxxxxx>
> Date            : 2010-06-05 0:46 (34 days old)
> Message-ID      : <AANLkTim7CiW-yfugZUAHZCqLvXKgt9CwolCvbLGdCLAk@xxxxxxxxxxxxxx>
> References      : http://marc.info/?l=linux-kernel&m=127569877714937&w=2

Not a regression or a kernel bug at all. See the thread. Big ramdisk
filled up all of memory when it was filled by the builds.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16145
> Subject         : Unable to boot unless "notsc" or "clocksource=hpet", or acpi_pad disabling the TSC
> Submitter       : Tom Gundersen <teg@xxxxxxx>
> Date            : 2010-06-07 13:11 (32 days old)
> Handled-By      : Venkatesh Pallipadi <venki@xxxxxxxxxx>
>                  Len Brown <lenb@xxxxxxxxxx>

This is not a regression. See the full bugzilla details. The same
problem persists at least back to 2.6.30 with his config. So it's
somehow specific to his particular config use that requires "notsc" to
boot.

> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16122
> Subject         : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
> Submitter       : Larry Finger <Larry.Finger@xxxxxxxxxxxx>
> Date            : 2010-06-04 13:18 (35 days old)
> Handled-By      : Jens Axboe <axboe@xxxxxxxxx>

This looks like a duplicate of that 16312 bugzilla entry. Jens, has
this been resolved?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/