Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

From: Russell King - ARM Linux admin
Date: Wed Dec 30 2020 - 14:19:26 EST


On Wed, Dec 30, 2020 at 06:31:52PM +0100, Pali Rohár wrote:
> On Wednesday 30 December 2020 17:05:46 Russell King - ARM Linux admin wrote:
> > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote:
> > > This change is really required for those Realtek chips. I thought that
> > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only
> > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be
> > > really done by one i2 transfer, it must be done in two.
> >
> > Then these modules are even more broken than first throught, and
> > quite simply it is pointless supporting the diagnostics on them
> > because we can never read the values in an atomic way.
>
> They are broken in a way that neither holy water help them...
>
> But from diagnostic 0x51 address we can read at least 8bit registers in
> atomic way :-)

... which doesn't fit the requirements.

> > It's also a violation of the SFF-8472 that _requires_ multi-byte reads
> > to read these 16 byte values atomically. Reading them with individual
> > byte reads results in a non-atomic read, and the 16-bit value can not
> > be trusted to be correct.
> >
> > That is really not optional, no matter what any manufacturer says - if
> > they claim the SFP MSAs allows it, they're quite simply talking out of
> > a donkey's backside and you should dispose of the module in biohazard
> > packaging. :)
> >
> > So no, I hadn't understood this from your emails, and as I say above,
> > if this is the case, then we quite simply disable diagnostics on these
> > modules since they are _highly_ noncompliant.
>
> We have just two options:
>
> Disable 2 (and more) bytes reads from 0x51 address and therefore disable
> sfp_hwmon_read_sensor() function.
>
> Or allow 2 bytes non-atomic reads and allow at least semi-correct values
> for hwmon. I guess that upper 8bits would not change between two single
> byte i2c transfers too much (when they are done immediately one by one).

So when you read the temperature, and the MSB reads as the next higher
value than the LSB, causing an error of 256, or vice versa causing an
error of -256, which when scaled according to the factors causes a big
error, that's acceptable.

No, it isn't. If the data can't be read reliably, the data is useless.

Consider a system that implements userspace monitoring for modules and
checks the current values against pre-set thresholds - it suddenly gets
a value that is outside of its alarm threshold due to this. It raises a
false alarm. This is not good.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!