The Rising Criticality of Mean Time-to-Repair and How to Get Ahead of It
By Michal Erel, Senior Product Manager | March 10, 2021
It’s been about a year since the COVID-19 pandemic catalyzed the rapid expansion of remote workforces and dramatically increased the need for remote access. These conditions meant that practically overnight, operational technology (OT) managers and industrial network administrators found themselves on the front lines—providing online connectivity to OT engineers who typically access industrial control system (ICS) equipment physically. Suddenly, remote access to industrial environments became essential in order to keep production lines and critical industrial processes running smoothly.
As we build towards a future in which workforces are increasingly distributed, now is the time for OT and IT security leaders to step back and determine if their remote-access solutions meet the needs of their OT engineers for the long haul. For many organizations, the evaluation criteria for these types of solutions revolve largely around security. And given the findings of our latest Biannual ICS Risk and Vulnerability Report, as well as the conditions surrounding the recent attack against a Florida water treatment facility, it is understandable why security typically is (and should be) a top priority in this context.
Often overlooked, however, is that meeting an OT engineer’s OT remote-access needs isn’t just about security. The access a solution provides must also be frictionless and reliable—because if it isn’t, even the most secure solution will still be likely to increase an OT environment’s exposure to risk.
Why? The answer revolves around a key performance indicator widely used among OT and IT security staff alike: mean time-to-repair (MTTR).
For OT engineers, MTTR is the average amount of time it takes to complete a repair or other necessary maintenance task on an industrial asset—from the moment at which the engineer connects to the asset and initiates the repair, to the moment at which they finalize the repair and disconnect from the asset. Generally speaking, the longer the MTTR, the worse the consequences of both diagnosed and undiagnosed maintenance issues are likely to be. This tends to be especially true when such issues pertain to critical vulnerabilities, dangerous configuration errors, or anything else that poses a considerable threat to OT availability, integrity, and/or safety.
There are seemingly countless variables that can impact MTTR, but user experience is arguably one of the most impactful for remote OT engineers who rely on remote access solutions. When a solution’s user experience is neither frictionless nor reliable, the OT engineers who use it are forced to contend with challenging conditions that can increase MTTR and the exposure to risks associated with it.
To illustrate how a remote access solution’s user experience can impact an OT engineer’s MTTR, consider how the following example scenario plays out with two different solutions:
Scenario: A remote OT engineer is notified of a programmable logic controller (PLC) configuration error requiring emergency repair within 30 minutes to avoid downtime.
Solution 1: An IT-oriented remote access solution that provides remote access to the OT environment via a series of jump servers connected to the corporate VPN
To connect to the PLC, the engineer must go through four login screens, each with their own set of credentials.
It takes five minutes to go through the first three login screens. Once the engineer reaches the fourth, they forget their password. After several attempts, they’re locked out.
The engineer calls the OT manager for a password reset, which takes 15 minutes. By the time they go back through all four login screens, they have only five minutes to repair the PLC.
At last, the engineer connects to the PLC, only to encounter an impossibly complex interface that looks nothing like their on-premises human-machine interface (HMI). Unsure of where to go or what to click on to make the repair, time runs out and downtime ensues.
To connect to the PLC, a frictionless, secure login process and password vaulting provide rapid access.
A simple interface that mirrors the engineer’s on-premises HMI enables them to easily navigate and repair the PLC with plenty of time to spare.
Claroty SRA reduces MTTR and boosts uptime for OT environments by making it faster and easier for OT engineers to safely connect to and troubleshoot industrial assets at any time, from anywhere, and address exploitable flaws that could lead to process interruption or manipulation.
MTTR is also an important metric for the security leader, typically the CISO, responsible and accountable for ensuring the entire network – IT and OT environments – is up and running as it should be. The lower the MTTR, the less risk of downtime and disruption to operations and business performance.
I’m sure you can think of many additional scenarios where a secure remote access solution, purpose-built for industrial networks, would help you reduce MTTR and increase business resilience.
To learn more about how Claroty SRA gives OT engineers and CISOs peace of mind that they’re minimizing exposure to risks and keeping operations running smoothly, request a demo.