Re: Linux 2.6.21

From: Adrian Bunk
Date: Thu Apr 26 2007 - 14:36:19 EST


On Thu, Apr 26, 2007 at 02:04:56PM -0400, Jeff Garzik wrote:
> IMO, the closer you look, the more warts you find. Before you starting
> doing your work with kernel regressions, no one was really tracking it. I
> bet you have helped cut down on the regressions, but I have no good way to
> quantify my gut feeling.
>
> Additional comments on developers and fixing regressions:
>
> * Sometimes seeing a long list, peoples' eyes glaze over. Its just human
> nature. A long list also gives us no idea of scale, or severity. I bet a
> weekly "top 10 bugs and regressions" email would help focus developer
> attention.

Top 10 ordered by what?

I always tried to put the most mysterious ones like "kwin dies silently"
or "gammu no longer works" at the top of my lists hoping some developer
might become interested in getting clues about them.

But when there were 15 outstanding suspend regressions, these were also
important.

And how to order the rtl8139 netdriver regression reported one month ago
against the snd_hda_intel regression reported one month ago?

Both are "only" drivers no longer working for some users, but there
might be many more users for whom they don't work in 2.6.21 because the
maintainers didn't bother to debug and fix them.

Ideally, maintainers should debug and fix regressions (and other bugs)
without requiring weekly regression lists.

If bug reports and weekly reminders aren't enough, I don't think
increasing the level of babysitting would make sense.

> * To be effective, lists, either long or top-10, must be pruned if you get
> a sense that only one user is affected. [With oopses and BUGs as a clear
> exception,] many problems benefit from at least two users reporting a bug.

First of all we had several cases where one report by one user resulted
in a serious bug being fixed. There might be only one -rc tester with a
strange workload or some unusual hardware.

A bigger point is that "at least two users reporting a bug" assumes you
always know whether two bug reports are for the same bug.
Some examples for problems:
- during early 2.6.21-rc, it was not unusual for users to run into
three unrelated suspend related regressions on one computer
- during 2.6.20-rc, we had 4 similar looking reports, 3 for one
regression, and the forth for a completely different regression
- during 2.6.21-rc, it turned out the following reports were caused
by the same regression:
kwin dies silently
KDE problem if a sound is played while the display is suspended

> * It gets a bit tiresome to field the large number of driver bug reports
> that eventually turn out to be related to broken interrupt handling
> somehow. I think we developers need to get better at showing users how to
> isolate driver vs. PCI/ACPI/core bugs. Maybe drivers need to start
> introducing interrupt delivery tests into their probe code. Overall,
> broken interrupt handling manifests in several ways, most of which
> initially appear symptomatic of a broken driver.

Regressions have the advantage of being able to compare with a
known-good kernel.

If a driver works with 2.6.20 but doesn't work with 2.6.21-rc, ask the
submitter for dmesg's from both and diff them. In my experience that
usually gives a good indication whether or not it's an interrupt related
ACPI/PCI issue.

> Jeff

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/