Re: [bugzilla-daemon@xxxxxxxxxxxxxxxxxxx: [Bug 208507] New: BISECTED: i2c timeout loading module ddbridge with commit d2345d1231d80ecbea5fb764eb43123440861462]

From: Daniel Scheller
Date: Wed Jan 25 2023 - 15:26:13 EST


Am Wed, 25 Jan 2023 11:05:06 -0600
schrieb Bjorn Helgaas <helgaas@xxxxxxxxxx>:

> [+cc Salvatore, Mauro, Daniel, linux-media]
>
> On Thu, Jul 09, 2020 at 02:17:22PM -0500, Bjorn Helgaas wrote:
> > Bisected to Debian commit d2345d1231d8, which is a backport of the
> > upstream commit b88bf6c3b6ff ("PCI: Add boot interrupt quirk
> > mechanism for Xeon chipsets").
> >
> > Reporter confirmed that reverting the Debian backport from 4.19.132
> > fixes the problem.
> >
> > ----- Forwarded message from bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
> > -----
> >
> > Date: Thu, 09 Jul 2020 15:01:11 +0000
> > From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
> > To: bjorn@xxxxxxxxxxx
> > Subject: [Bug 208507] New: BISECTED: i2c timeout loading module
> > ddbridge with commit d2345d1231d80ecbea5fb764eb43123440861462
> > Message-ID: <bug-208507-41252@xxxxxxxxxxxxxxxxxxxxxxxxx/>
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=208507
> >
> > Bug ID: 208507
> > Summary: BISECTED: i2c timeout loading module ddbridge
> > with commit d2345d1231d80ecbea5fb764eb43123440861462
> > Product: Drivers
> > Version: 2.5
> > Kernel Version: 4.19.132
> > Hardware: x86-64
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: PCI
> > Assignee: drivers_pci@xxxxxxxxxxxxxxxxxxxx
> > Reporter: bernhard@xxxxxxxxxx
> > Regression: Yes
> >
> > Created attachment 290179
> > -->
> > https://bugzilla.kernel.org/attachment.cgi?id=290179&action=edit
> > dmesg on 4.19.132
> >
> > OS: Debian 10.4 Buster
> > CPU: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
> > Hardware: Supermicro Super Server
> > Mainboard: Supermicro X10SDV
> > DVB card: Digital Devices Cine S2 V7 Advanced DVB adapter
> >
> > Issue:
> > =====
> > Loading kernel module ddbridge fails with i2c timeouts, see
> > attached dmesg. The dvb media adapter is unusable.
> > This happened after Linux kernel upgrade from 4.19.98-1+deb10u1 to
> > 4.19.118-2+deb10u1.
> >
> > A git bisect based on the Debian kernel repo on branch buster
> > identified as first bad commit:
> > [1fb0eb795661ab9e697c3a053b35aa4dc3b81165] Update to 4.19.116.
> >
> > Another git bisect based on upstream Linux kernel repo on branch
> > v4.19.y identified as first bad commit:
> > [d2345d1231d80ecbea5fb764eb43123440861462] PCI: Add boot interrupt
> > quirk mechanism for Xeon chipsets.
> >
> > Other affected Debian kernel version: 5.6.14+2~bpo10+1
> > I tested this version via buster-backports, because so far I was
> > unable to build my own kernel from 5.6.y or even 5.7.y.
> >
> > Workaround:
> > ==========
> > Reverting the mentioned commit
> > d2345d1231d80ecbea5fb764eb43123440861462 on top of 4.19.132 is
> > fixing the problem. Reverting the same commit on 4.19.118 or
> > 4.19.116 is also fixing the problem.
>
> Sorry, I dropped the ball on this.
>
> Berni has verified that this problem still exists in v6.1.4, and has
> attached current dmesg logs and lspci output.
>
> Sean's comment
> (https://bugzilla.kernel.org/show_bug.cgi?id=208507#c18) suggests
> this is actually a ddbridge driver issue related to INTx emulation or
> MSI support.
>
> Berni confirmed that the i2c timeouts happen when
> CONFIG_DVB_DDBRIDGE_MSIENABLE is not enabled, and that enabling MSI
> via the "ddbridge.msi=1" module parameter avoids the i2c timeouts.
>
> The Kconfig help for DVB_DDBRIDGE_MSIENABLE:
>
> Use PCI MSI (Message Signaled Interrupts) per default. Enabling this
> might lead to I2C errors originating from the bridge in conjunction
> with certain SATA controllers, requiring a reload of the ddbridge
> module. MSI can still be disabled by passing msi=0 as option, as
> this will just change the msi option default value.
>
> suggests that there may be an i2c or SATA issue that could be fixed so
> ddbridge MSI could be always enabled. But I don't know about that
> underlying issue.
>
> Per MAINTAINERS, the ddbridge driver looks orphaned, so I cc'd the
> media folks and Daniel, who might know something about the MSI issues,
> based on adaf4df70521 ("media: ddbridge: Kconfig option to control the
> MSI modparam default").

Bjorn/all,

I'll try to at least clarify from what I remember from "back then",
since it's over 4,5 years that I last actively worked on mainlining the
vendor driver and not being affiliated with the vendor at all.

MSI being defaulted to disabled with the possibility to try
ddbridge.msi=1 is the result of quite a bit of user feedback mainly at
vdr-portal.de, recommendations from the vendor itself in case of I2C
troubles with their cards, plus an explanation from them (been in
contact with the vendor at that time) telling me that "some"
PCIe/chipsets are buggy especially with regards to MSI and thus can
cause trouble with their cards. Also, in addition, I experienced I2C
troubles with ddbridge.msi=1 myself so I decided to make the default
"0" (with an option to toggle the default marked as "dangerous" due to
the known possible issues) as this worked the best for everyone who
opted for testing the mainlined driver code. Due to lack of real
knowledge about chipsets, PCIe, interrupts and so on, and lack of
technical docs about the cards, we could only assume we're dealing with
some sort of incompatibility between the cards and possibly certain
components, though ddbridge.msi=0 made everything play nicely with each
other.

Regarding #208507, I see that there are chipsets and/or platforms that
either soft-disable legacy IRQ signalling/routing (via the
mentioned/reverted/bisected commit), or even don't support this at all
and thus make MSI a requirement, and causing interrupts not being
serviced at all if MSI isn't available or enabled by the device.

If it is possible to query the underlying subsystems regarding
availability of legacy interrupt signalling, I guess the best option is
to perform such evaluation in the ddbridge driver during setup and
enable MSI signalling if that's the only option regardless of
ddbridge.msi=x, what seems safe on such platforms according to the
reporter (card running fine with msi=1).

I'll happily provide patches for such a change though I would very
welcome any guidance on the subsystem query topic.

Best regards,
Daniel