Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk

From: liaoweixiong
Date: Thu Feb 06 2020 - 08:10:58 EST


hi Miquel Raynal,

On 2020/1/23 AM 1:41, Miquel Raynal wrote:
Hello,


+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+ size_t retlen;
+ int ret;
+
+ if (mtdpstore_block_isbad(cxt, off))
+ return -ENEXT;
+
+ pr_debug("try to read off 0x%llx size %zu\n", off, size);
+ ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
+ if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen) {

IIRC size != retlen does not mean it failed, but that you should
continue reading after retlen bytes, no?
>>
Yes, you are right. I will fix it. Thanks.
Also, mtd_is_bitflip() does not mean that you are reading a false
buffer, but that the data has been corrected as it contained bitflips.
mtd_is_eccerr() however, would be meaningful.
>>
Sure I know mtd_is_bitflip() does not mean failure, but I do not think
mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
mtd_is_bitflip().

Yes, just drop this check, only keep ret < 0.

If I don't get it wrong, it should not be dropped here. Like your words,
"mtd_is_bitflip() does not mean that you are reading a false buffer,
but that the data has been corrected as it contained bitflips.", the
data I get are valid even if mtd_is_bitflip() return true. It's correct
data and it's no need to go to handle error. To me, the codes
should be:
if (ret < 0 && !mit_is_bitflip())
[error handling]

Please check the implementation of mtd_is_bitflip(). You'll probably
figure out what I am saying.

https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585


How about the codes as follows:

for (done = 0, retlen = 0; done < size; done += retlen) {
ret = mtd_read(..., &retlen, ...);
if (!ret)
continue;
/*
* do nothing if bitflip and ecc error occurs because whether
* it's bitflip or ECC error, just a small number of bits flip
* and the impact on log data is so small. The mtdpstore just
* hands over what it gets and user can judge whether the data
* is valid or not.
*/
if (mtd_is_bitflip(ret)) {
dev_warn("bitflip at....");
continue;
} else if (mtd_is_eccerr(ret)) {
dev_warn("eccerr at....");
retlen = retlen == 0 ? size : retlen;
continue;
} else {
dev_err("read failure at...");
/* this zone is broken, try next one */
return -ENEXT;
}
}


|...]

+ return;
+ }
+ if (unlikely(info->dmesg_size % mtd->writesize)) {
+ pr_err("record size %lu KB must align to write size %d KB\n",
+ info->dmesg_size / 1024,
+ mtd->writesize / 1024);

This condition is weird, why would you check this?
>>
pstore/blk will write 'record_size' dmesg log at one time.
Since each write data must be aligned to 'writesize' for flash, I am not
sure
all flash drivers are compatible with misaligned data, that's why i
check this.

I think you should enforce this alignment instead of checking it.

Do you mean that mtdpstore should enforce this alignment while running?
The way I can think of is to handle a buffer aligned to writesize and
write to flash with this aligned buffer.

That causes some error. The MTD device will be divided into mutil
chunks accroding to dmesg_size. Each chunk stores a individual
OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
write results in next write failure because the page of flash can only
be programed once before next erase and the page shared by two chunks
has been used by the last write. Besides, we can not read to buffer,
ersae and write back as there is no read/erase for panic case.

I mean: what is the usual size of dmesg? I don't get why you need it to

The usual size of dmesg is 64K, usually be equal to log_buf size.

be ie. a multiple of 2k. It probably is actually, I don't know if there
is a standard. But if dmesg_size is eg 3k, just skip the end of the
partially written page and start writing at the next page?


1. upper layer do not support to skip partially written page
The upper layer pstore/blk will not skip the end of the partially
written page since it is not only used for MTD device, but also
block device, which has no page limited. A common practice at the
upper layer is to check the size and limit size to be aligned. We
make dmesg_size to be a multiple of 4K for greater compatibility.

2. chunks management and size per write
The mtdpstore tells pstore/blk how large the device is. Then
pstore/blk will divide it into several chunks according to
dmesg_size. The pstore/blk will write dmesg_size data at a time.

In a word, the amount of data written each time can not lead to page
slicing, so, dmesg_size must be aligned to writesize.


+ return;
+ }
+ if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
+ pr_err("mtd%d is too large (limit is %d MiB)\n",
+ mtd->index,
+ MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);

Same question? I could understand that it is easier to manage blocks
knowing their maximum number though.
>>
It refers to mtdoops.

What do you mean?

To me, it's unnecessary to check at all, however it is really there
on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
It may be helpfull for some cases out of my think, that's why I keep it.

Why not.


OK, I will drop it.

[...]


In case of repeated erase when users remove several log files, mtdpstore
do remove jobs when exit.

Besides, mtdpstore do not check the return code to ensure write back valid
log as much as possible.

You are not in a critical path, I don't understand why you don't check
it? If it returns an error, it means the data is not written. IMHO it
is best to alert the user than to silently fail.

This function will be called only when mtd device is removing. It's
useless to alert the user but try to flush the other valid data to

It is useful to alert the user! It means something went wrong while
everything seems fine.

flash as mush as possible by which reduce losses. If it's just
because of busy, what happens next time?

Just because of busy? I don't get it.

I want to express that if the write fails due to busy, the next one
may succeed.


I'm okay with the idea of trying to write the other chunks though:

while (remaining_chunk) {
ret = mtd_write()
if (ret) {
alert-user;
continue;
}
}


OK, I will fix it.


+. >>>> + off += zonesize;
+ size -= min_t(unsigned int, zonesize, size);
+ }
+
+free:
+ kfree(buf);
+ return ret;
+}
+


[...]

Thanks,
MiquÃl
>>
I will collect more suggestions and submit the new version at one time.

Sure, no hurry.

I am on holiday, please forgive me for my slow response.

Take your time, as I said, no hurry.



Thanks,
MiquÃl




Thanks,
MiquÃl