RE: [RFC][PATCH v2 2/3] Hold multiple logs

From: Seiji Aguchi
Date: Thu Jul 19 2012 - 20:39:39 EST



Thank you for describing this in detail.

> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.

I understand the reason why you think 3 or 4 logs are reasonable.
There are some cases 2nd or 3rd oops is critical....

I have some enterprise customers who are sensitive for a software failure and specify panic_on_oops=1.
In this case, they don't need 3,4 logs. 2 logs are enough.

So, kernel parameter should be as follows.

Log_num =1
- For users who want to hold just one log.

Log_num=2
- For users who can handle multiple logs and 1st oops is concerned. (by specifying panic_on_oops=1)

Log_num=3,4
- for users who care about 2nd or 3rd oops.

Log_num=5 or more
Invalid value.

If there is misunderstanding, please let me know.

Seiji

> -----Original Message-----
> From: Luck, Tony [mailto:tony.luck@xxxxxxxxx]
> Sent: Thursday, July 19, 2012 7:42 PM
> To: Seiji Aguchi; linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; mikew@xxxxxxxxxx; dzickus@xxxxxxxxxx; Matthew
> Garrett (mjg@xxxxxxxxxx)
> Cc: dle-develop@xxxxxxxxxxxxxxxxxxxxx; Satoru Moriya
> Subject: RE: [RFC][PATCH v2 2/3] Hold multiple logs
>
> > If you are concerned about multiple OOPS case, I think an user app which logs from /dev/pstore to /var/log should be developed.
>
> Agreed - we need an app/daemon to do this.
>
> > Once it is developed, we don't need to care about multiple oops case and the appropriate number is two.
>
> Only if you can guarantee that the app/daemon will run and save the first OOPS before the next occurs. Even if the system were
> running normally this might be difficult to achieve.. but in this case we know the system isn't running normally (it just OOPSed twice!).
>
> However - there is progressively less value in collecting additional consecutive OOPS. Perhaps one is enough 90% or even 99% of the
> time. I'm naturally paranoid so having two or three would make me feel happy that most of the remaining 10% or 1% of the cases
> were covered.
>
> > - In case where system is workable after oops.
> > The user app will erase an entry in NVRAM.
> > And we can get the message via /var/log.
>
> Yes - the system can keep running after many types of OOPs - so the OOPS will be logged in /var/log (or by the app/daemon copying
> from pstore, or both).
>
> > - In case where system hangs up or panics due to the oops.
> > Oops is the critical message and we don't need care about subsequent events.
>
> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.
>
> -Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/