Monitoring Hard Drive failures through Kaseya

Does losing client’s data keep you up at night?

As a managed services provider, one of our biggest fears is the loss of client’s data. It is probably the single most thing that we worry about and discuss constantly. At Network Depot (our local MSP division), we spend considerable resources monitoring backups to ensure that if a disaster strikes, we will be prepared.

One of the things that seems to have eluded us, is the consistent monitoring of server hard drive failures. From time to time we notice an array that has a degraded or failed drive, but we were not getting notified via the standard event set monitors.

A few months ago we figured out what was happening. It turns out that Dell and HP seem to think that a degraded drive is NOT an ERROR event type, it only merits a WARNING, and we weren’t paying attention to warnings.

We created a new event set called “VA – Hard Drive Warnings” and started monitoring the System log for any events with the word “degraded”. This started bearing fruit, and we included it in all-new templates across all our systems, but it didn’t seem to find everything.

Often as MSPs, we get so lost in the weeds that we don’t have time to step back and see the big picture. Yesterday, our CTO Benjamin spent his Sunday researching this issue and discovered and documented the different types of events that are generated by HP and Dell.

HP – If you have HP servers, you need to make sure that HP Insight Manager WBEM is installed. It is the WBEM that writes these events to the Windows Event Logs. You can find a complete description of the events here: https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c04436799

Log: System
Source: HP SAS, HP SCSI, HP SmartArray

We included the following in the Event Set referenced below

Source: HP SAS

Source	Event ID
HP SAS	102
HP SAS	103
HP SAS	202
HP SAS	204
HP SAS	311
HP SAS	312

Source HP SATA

Source	Event ID
HP SATA	604
HP SATA	605

Source HP SCSI

Source	Event ID
HP SCSI	3
HP SCSI	5
HP SCSI	8
HP SCSI	10

Source: HP SmartArray

Source	Event ID
HP SmartArray	102
HP SmartArray	103
HP SmartArray	104
HP SmartArray	202
HP SmartArray	204
HP SmartArray	206
HP SmartArray	207

Dell: If you have Dell servers, you need to make sure that the Dell OpenManage Server Administrator is installed and configured. Dell OpenManage creates the event log entries.

Log: System
Source: Server Administrator

Event ID	Type	Description
2065	Informational	Rebuild Started
2158	Informational	Physical disk online
2121	Informational	Device Returned to normal
2052	Informational	Physical disk inserted
2057	Warning	Virtual disk degraded
2049	Warning	Physical disk removed
2123	Warning	Redundancy lost: Virtual disk
2050	Warning	Physical Disk is offline
2048	Error	Physical disk failed
2299	Error	Bad PHY slot

We updated our event set and I encourage all of you to review your server policy settings or templates and add this event set. You would select the System Log and be sure to check Error and Warning.

BTW, I can’t stress enough that you MUST make sure that your vendor’s management software is installed and configured correctly! Without that, it is unlikely you will get these errors!

I hope this helps! If anyone has any suggestions for improvement or has settings for IBM or Lenovo, please let me know, and I will update the post!