Posts tagged 'hardware'

How to identify the physical DIMM from a Machine Check Exception (MCE) memory error log

This is a short rewrite of a post I wrote elsewhere, but which is no longer easily searchable or accessible. If you’ve got a DIMM that’s going bad and your system supports Machine Check Architecture (MCA) / Machine Check Exceptions (MCEs), you might see alerts about memory errors popping up in your logs or console output. They typically look something like this: MCA: Bank 9, Status 0x8c000047000800c0 MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 MCA: Vendor "GenuineIntel", ID...

Investigating a failure to read DIMM SPD data on Intel Xeon Scalable platforms

Memory DIMMs have a small flash memory chip (EEPROM) on them, containing an important descriptor table called the Serial Presence Detect (SPD). This data tells the system the size, speed, timings, operating voltage, manufacturer, part number, overclocking profiles, and all sorts of other information about each DIMM. The SPD chip is accessed using the SMBus protocol, which is based on I2C. Tools such as CPU-Z, RAMMon, and RW Everything can be used to read the SPD data by talking to the flash chip over a...