Re: [PATCH v8 3/9] misc: smpro-errmon: Add Ampere's SMpro error monitor driver

From: Quan Nguyen
Date: Thu Jun 02 2022 - 05:36:43 EST


On 01/06/2022 16:33, Greg Kroah-Hartman wrote:
On Wed, Jun 01, 2022 at 03:21:47PM +0700, Quan Nguyen wrote:
+ if (err_type & BIT(2)) {
+ /* Error with data type */
+ ret = regmap_read(errmon->regmap, err_info->err_data_low, &data_lo);
+ if (ret)
+ goto done;
+
+ ret = regmap_read(errmon->regmap, err_info->err_data_high, &data_hi);
+ if (ret)
+ goto done;
+
+ count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
+ 4, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
+ ret_hi & 0xff, ret_lo, data_hi, data_lo);
+ /* clear the read errors */
+ ret = regmap_write(errmon->regmap, err_info->err_type, BIT(2));
+
+ } else if (err_type & BIT(1)) {
+ /* Error type */
+ count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
+ 2, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
+ ret_hi & 0xff, ret_lo, data_hi, data_lo);
+ /* clear the read errors */
+ ret = regmap_write(errmon->regmap, err_info->err_type, BIT(1));
+
+ } else if (err_type & BIT(0)) {
+ /* Warning type */
+ count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
+ 1, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
+ ret_hi & 0xff, ret_lo, data_hi, data_lo);

Hi Greg,

Since the internal representation of the internal error is split into high
low chunks of the info and data values which need to be communicated
atomicly, I'm treating them as "one value" here.

That is a huge "one value", that's not what this really is, it needs to
be parsed by userspace, right?

Thanks Greg for the review,

User space needs all of this "one value" to know what exactly is the error.

In our latest version, we remove all the if...else and simplify the code as below:
/*
* The internal representation of the internal error is split into high
* low chunks of the info and data values. Rather than temporarily
* dumping these into an array and printing that, skip the intermediate
* step and print them using a concatenation encoding.
*/
count = sysfs_emit(buf, "%04x%04x%04x%04x\n", info_h, info_l, data_h, data_l);

/* clear the read error */
ret = regmap_write(errmon->regmap, err_info->type, err_type);
return ret ? ret : count;

And why does this have to be atomic? What happens if the values change
right after you read them? What is userspace going to do with them?

Because the error is bigger than single register can hold so it is split into small chunks to report via multiple separate registers.

Firmware stores each error in a queue. As the error's chunks are stored in separate registers. All of these registers will need to be read out before the error is clear so that the next error in the queue can be reported. That is why we say those chunks must be read out atomically.

User space will need to parse these information themself.

I could dump them in a
temporary array and print that, but it seems like additional complexity for
the same result. Can we consider this concatenated encoding as "an array of
the same type" for the purposes of this driver?"

That's really not a good idea as sysfs files should never need to be
"parsed" like this.
> Again, what are you trying to do here, and why does it have to be
atomic?

thanks,

greg k-h