Skip to main content
Ask Question
Asked a question 8 years ago

What is an MCE failure? Why is it running alongside the disk burn test?

Where am I?

In Bright Computing, Inc. you can ask and answer questions and share your experience with others!

What is an MCE failure?

MCE stands for Machine Check Exception, and should not be ignored.

If you see the kernel reporting these, then it is highly likely that the hardware it is running on is not functioning properly and that the vendor needs to fix something.
Most commonly, you uncover these during Bright's burn (stress test) of the cluster.

Why is it running alongside the disk burn test?

Quite often the problem is memory-related. The mce_check burn test constantly monitors the kernel for MCE reports, which is why it runs in parallel to the disk burn test, as well as in almost all other tests. In some cases, stressing the disks will also trigger an MCE error. The exact MCE errors are logged to a file in the node's burn spool.

Have a look in the appendix on "Burning Nodes" for more on doing burns in general.