Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

From: Paolo Bonzini
Date: Wed Nov 16 2016 - 07:11:03 EST


> Not sure how independent ERST is from ACPI and other specs. It looks
> like referencing UEFI spec at least.

It is just the format of error records that comes from the UEFI spec
(include/linux/cper.h) but you can ignore it, I think. It should be
handled by tools on the host side. For you, the error log address
range contains a CPER header followed by a binary blob. In practice,
you only need the record length field (bytes 20-23 of the header),
though it may be a good idea to validate the signature at the beginning
of the header.

> Btw, is the ERST used for pstore only (in Linux)?

Yes. It can store various records, including dmesg and MCE.

There are other examples in QEMU of interfaces with ACPI. They all use the
DSDT, but the logic is similar. For example, docs/specs/acpi_mem_hotplug.txt
documents the memory hotplug interface. In all cases, ACPI tables contain small
programs that talk to specialized hardware registers, typically allocated to
hard-coded I/O ports.

In your case, the registers could occupy 16 consecutive I/O ports, like the
following:

0x00 read/write operation type (0=write,1=read,2=clear,3=dummy write)

0x01 read-only bit 7: if set, operation in progress

bit 0-6: operation status, see "Command Status Definition" in
the ACPI spec

0x02 read-only when read:

- read a 64-bit record id from the store to memory,
from the address that was last written to 0x08.

- if the id is valid and is not the last id in the store,
write the next 64-bit record id to the same address

- otherwise, write the first record id to the same address,
or 0xffffffffffffffff if the store is empty

0x03 unused, read as zero

0x04-0x07 read/write offset of the error record into the error log address range

0x08-0x0b read/write when read, return number of stored records

when written, the written value is a 32-bit memory address,
which points to a 64-bit location used to communicate record ids.

0x0c-0x0f read/write when read, always return -1 (together with the "mask" field
and READ_REGISTER, this lets ERST instructions return any value!)

when written, trigger the pstore operation:

- if the current operation is a dummy write, do nothing

- if the current operation is a write, write a new record, using
the written value as the base of the error log address range. The
length must be parsed from the CPER header.

- if the current operation is a clear, read the record id
from the memory location that was last written to 0x08 and do the
operation. the value written is ignored.

- if the current operation is a read, read the record id from the
memory location that was last written to 0x08, using the written
value as the base of the error log address range.

In addition, the firmware will need to reserve a few KB of RAM for the error log
address range (I checked a real system and it reserves 8KB). The first eight
bytes are needed for the record identifier interface, because there's no such
thing as 64-bit I/O ports, and the rest can be used for the actual buffer.

QEMU already has an interface to allocate RAM and patch the address into an
ACPI table (bios_linker_loader_alloc). Because this interface is actually meant
to load data from QEMU into the firmware (using the "fw_cfg" interface), you
would have to add a dummy 8KB file to fw_cfg using fw_cfg_add_file (for
example "etc/erst-memory"), it can be just full of zeros.

QEMU supports two chipsets, PIIX and ICH9, and the free I/O port ranges are
different. You could use 0xa20 for ICH9 and 0xae20 for PIIX.

All in all, the contents of the ERST table would not be very different from a
non-virtual system, except that on real hardware the firmware would use SMIs
as the trap mechanism. You almost have a one-to-one mapping between ERST
actions and registers accesses:

BEGIN_WRITE_OPERATION write value 0 to register at 0x00
BEGIN_READ_OPERATION write value 1 to register at 0x00
BEGIN_CLEAR_OPERATION write value 2 to register at 0x00
BEGIN_DUMMY_WRITE_OPERATION write value 3 to register at 0x00
END_OPERATION no-op
CHECK_BUSY_STATUS read register at 0x01 with mask 0x80
GET_COMMAND_STATUS read register at 0x01 with mask 0x7f
SET_RECORD_OFFSET write register at 0x04
GET_RECORD_COUNT read register at 0x08
EXECUTE_OPERATION write ERST memory base + 8 to 0x0c
GET_ERROR_LOG_ADDRESS_RANGE read register at 0x0c (with mask = ERST memory base + 8)
GET_ERROR_LOG_ADDRESS_RANGE_LENGTH read register at 0x0c (with mask = 8192 - 8 = 8184)
GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES read register at 0x0c (with mask = 0)

Only the get/set record identifier instructions are a little harder:

GET_RECORD_IDENTIFIER write ERST memory base to register at 0x08
read register at 0x02
read eight bytes at ERST memory base

SET_RECORD_IDENTIFIER write ERST memory base to register at 0x08
write eight bytes at ERST memory base

On top of this, you need to add the APEI UUID (see apei_osc_setup in Linux)
to build_q35_osc_method, and use "-M q35" when you start QEMU. If you need
more help just ask. I or others can help you with the ACPI glue, then you
can write the file backend yourself, based on your existing virtio-pstore code.

> Also I need to control pstore driver like using bigger buffer,
> enabling specific message types and so on if ERST supports. Is it
> possible for ERST to provide such information?

It's the normal pstore driver, same as on a real server. What exactly do you
need?

Paolo