1) What is PSOD ?
A Purple Screen of Death (PSOD) is a diagnostic screen with white type on a purple background that is displayed when the VMkernel of an ESX/ESXi host experiences a critical error, becomes inoperative and terminates any virtual machines that are running.
You will be able to see that screen on the console of the server. You need to either be in the datacenter and connect a monitor or remotely using the server’s out-of-band management (iLO, iDRAC, IMM… depending on your vendor).
2) Why PSOD ?
It is a stage of kernel panic. The ESXi kernel (vmkernel) triggers this safety measure in response to an error which is unrecoverable and would mean it continuing to run would pose a high risk for the services and VMs. when the ESXi hosts feels it became corrupted, and display the purple screen with Long message and Code .It appeared due to Hardware (RAM or CPU ) failures. They normally throw out a Machine Check Exception “MCE” or non-maskable interrupt “NMI” error.
3) Impact of PSOD ?
When the kernal is in panic stage then host is crash and it terminate all the services immediately . The VMs are not gracefully shutdown, but rather abruptly powered off. If the host is part of a cluster and you’ve configured HA, these VMs will be started on the other hosts in the cluster. If the host is part of vSAN cluster it means PSOD will impact vSAN also .
4) Analyze PSOD message .
First step to take a screenshot , you can take this remotely (IMM, iLO, iDRAC as per vendor ) .
seven info you need to know in PSOD screen :
1- That is Product and Build No .
2- That is error message .
3- Physical CPU register at the time of error .
4- The Physical CPU
5- The host Uptime
6- The stack trace (stage of VMkernal at the time of error )
7- The core dump
Few PSOD errors with KB article .
Check logs : Few example of PSOD log .
Components | Location | What is it |
System messages | /var/log/syslog.log |
Contains all general log messages and can be used for troubleshooting. |
VMkernel | /var/log/vmkernel.log |
Records activities related to virtual machines and ESXi. Most PSOD relevant entries will be in this log, so pay special attention to it. |
ESXi host agent log | /var/log/hostd.log |
Contains information about the agent that manages and configures the ESXi host and its virtual machines. |
VMkernel warnings | /var/log/vmkwarning.log |
Records activities related to virtual machines. Watch for heap exhaustion(Heap WorkHeap) related log entries. |
vCenter agent log | /var/log/vpxa.log |
Contains information about the agent that communicates with vCenter, so you can use it to spot tasks triggered by the vCenter and might have caused the PSOD. |
Shell log | /var/log/shell.log |
Contains a record of all commands typed, so you can correlate the PSOD to a command executed. |
Thanks hope you like it.
Rajiv Pandey.