Re: [PATCH] iio: dht11: Read bit stream from IRQ on falling edges only

From: harald
Date: Sat Feb 11 2023 - 16:26:00 EST


On 2023-02-11 11:41, pelzi@xxxxxxxxxxxxxxx wrote:
Am 07.02.23 um 11:33 schrieb harald@xxxxxxxxx:
2) A theoretical analysis about possible regressions depending on timer
resolution as mentioned in an earlier message.

This sounds as if you were doing such an analysis for the original
version. Can you share this work so I can attempt to repeat it
for the modified algorithm?

The short version is in the comments. The relevant section is:
/*
* Data transmission timing:
* Data bits are encoded as pulse length (high time) on the data line.
* 0-bit: 22-30uS -- typically 26uS (AM2302)
* 1-bit: 68-75uS -- typically 70uS (AM2302)
* The acutal timings also depend on the properties of the cable, with
* longer cables typically making pulses shorter.
*
* Our decoding depends on the time resolution of the system:
* timeres > 34uS ... don't know what a 1-tick pulse is
* 34uS > timeres > 30uS ... no problem (30kHz and 32kHz clocks)
* 30uS > timeres > 23uS ... don't know what a 2-tick pulse is
* timeres < 23uS ... no problem

The long version I probably don't have anymore, but it's not rocket
science. Just multiples of the time resolution. Eg:
34 = 68/2
23 = 68/3

3) Ideally figuring out, why your version performs better then what we
currently have. I have some suspicions, but better understanding might
lead to a better approach. E.g. maybe recording the other edges isn't
the problem so long as we ignore them during decoding?

As I see it, the main thing we are losing with your current proposal is
some diagnostic features. If we keep them as much as possible and have
regressions understood and covered, I see no reason to reject your idea.

That's why I changed the script to separately count EIO and ETIMEDOUT.
The latter indicates missed edges, the former failure to interpret
the data read.

What I see is that the patched driver's errors mostly result from missed
IRQ (note in contrast to last results, I cut the number of reads):

#    real[s]    user[s]    sys[s]    success    EIO    timeout err per succ
1     20.57    0.25    0.03    10    0    0    0
2     24.74    0.25    0.07    10    0    4    0,4
3     21.55    0.20    0.07    10    0    0    0
4     25.81    0.25    0.08    10    0    5    0,5
5     21.56    0.23    0.05    10    0    0    0
6     21.58    0.22    0.05    10    1    0    0,1
7     25.86    0.24    0.08    10    1    5    0,6
8     22.69    0.27    0.05    10    1    1    0,2
9     23.67    0.26    0.04    10    0    2    0,2
10     20.55    0.23    0.04    10    0    0    0

Whereas the original driver has more errors resulting from
mis-interpreted data:

#    real[s]    user[s]    sys[s]    success    EIO    timeout err per succ
1     24.88    0.26    0.07    10    5    4    0,9
2     25.91    0.26    0.07    10    4    5    0,9
3     31.27    0.31    0.10    10    6    10    1,6
4     29.17    0.32    0.11    10    7    8    1,5
5     22.73    0.24    0.08    10    4    2    0,6
6     46.46    0.35    0.25    10    19    24    4,3
7     23.79    0.23    0.09    10    3    3    0,6
8     30.17    0.27    0.11    10    6    9    1,5
9     23.77    0.26    0.06    10    3    2    0,5
10     20.58    0.24    0.06    10    1    0    0,1

I tried a variant that reads falling and rising edges and
uses the redundany of information to eliminate some errors.
This did not work out at all.

That's an interesting data point. Care to share the code?

It seems a relevant source of
trouble is delayed call to the IRQ handler. The problem is
that only then you try to find out if this IRQ is due to
rising or falling edge by reading the current GPIO level. When
you are to late, this might already have changed and you read
a level, but for the edge of _this_ level you'll receive another
IRQ a few us later.

I doubt this interpretation. Mostly I don't think you would even
get a second interrupt in this case: It is just a flag that
indicates something has changed, not a counter.

I expect, that you just get one missing edge (which we don't notice,
because we are tolerant about a missing preamble), which would
show as two consecutive edges of the same value - not three as
your explanation suggests.

I don't see, why it wouldn't be possible to recover from that,
in cases, where the delay is small enough for your version to work.

So the reason that this patch here is showing
lower error rates seems to be the lower probability of such
things happening by halving the IRQs to be handled, _plus_
the information from the hardware, that this IRQ was due
to a falling edge.

The first part is likely true at the moment and seems enough to
explain the data you have shown. I still believe we could be
smart about the second part in software.

Thanks,
Harald