Human error is a bigger problem within the data center industry than many realize. Research has shown that human error is the root cause of 60 to 80 percent of all downtime instances. What’s more, FORTRUST Chief Operating Officer Robert McClary identified human error as one of the most likely cause of unplanned outages and created strategies specifically to mitigate human error in the data center.
There are several behaviors and strategies clients can look for with their data center and colocation providers that can signal the provider’s commitment to eliminating human error. Let’s take a look:
1) Robust processes and documentation
In FORTRUST’s eBook, “A Data Center Operations Guide for Maximum Reliability,” McClary recommended establishing not only specific operational process controls and procedures, but also a robust strategy for documenting that activity.
“Process control and the comprehensive documentation of processes are critical because many unplanned downtime events are the result of human error,” McClary wrote. “Documented, validated and repeatable processes create a standardized approach to operations, service delivery and maintenance while mitigating or eliminating the risk associated with human error.”
In this way, every operational process that takes place within the data center should be carried out according to a documented, validated and well-practiced procedure.
While it can take some time and effort on the part of the data center managers and staff to create, document and maintain these procedures, this approach comes with considerable benefits. In addition to mitigating human error, having a library of procedures in place can also encourage consistency, support continuous training and learning, and help establish a knowledge base among staff members. This all goes a long way toward ensuring that problems never crop up in the first place.
2) Staff training to ensure the necessary skills
It’s also important for data center staff to have the skills needed to keep operations running smoothly, as well as to pinpoint and address any problems before they lead to downtime.
Certain skills are critical, while others can be taught over time. Overall, staff members should understand the basics of electrical and mechanical systems, the interrelationships between data center systems and how to troubleshoot common issues that can appear in these types of environments. In addition, staff should also have robust interpretation and analytical problem-solving skills.
In order to build up a consistent knowledge base, service providers should also train their staff on a regular basis. McClary noted that many facility operators offer short-lived “on the job” training, but do not necessarily continue this education. Training must be ongoing, and each individual employee should take responsibility for his or her education and competency.
“Documented processes and procedures can provide the foundation for training efforts.”
Documented processes and procedures can provide the foundation for training efforts. As this library is continually changing and expanding, additional training can ensure a keen understanding of each staff member’s role, responsibility and required skills.
3) Inspections and walkthroughs
It’s critical that data center employees are taking the time to physically walk through and inspect all critical systems in the facility. These walkthroughs can take place in conjunction with training efforts, helping staff members recognize key components and any issues that might arise.
Data center managers should establish a few documented procedures with their inspections to help guide these efforts. This includes a list of the items that should be checked during the walkthrough, the specific parameters that staff members should be recording, as well as the steps that should be taken with parameter results.
McClary pointed out that while these walkthroughs surely take time, they can also help staff identify easily corrected issues, preventing them from leading to larger-scale problems later on.
Overall, the key elements of preventing human error boil down to having the right strategies and procedures in place, training staff members and taking the time to inspect critical systems. FORTRUST leverages these, and a range of other processes within our Denver data center, enabling us to surpass the 15-year mark for continuous critical systems uptime. Those three components are critical to FORTRUST’s designation as a Tier-III data center operator.
To find out more about what our staff does to eliminate the chances of human error, contact us for a tour of our facility today.