A key element of every industrial control system (ICS) cybersecurity program is awareness and training. However, the materials now used mostly target the information technology (IT) security community and, to a lesser extent, automation and control engineers. Hardly any content is relevant for people such as process and instrumentation engineers and operators who deal with operational technology (OT).
Figure 1. A risk register should form the heart of any program.
A company must align awareness and training in its ICS cybersecurity program. Figure 1 shows the general structure of a typical program. At the heart of such a program is the risk register. It is a living document that feeds into the core elements of a cybersecurity program — governance, risk management, implementation and operations. In this article, we’ll highlight various topics to underscore the awareness and training requirements for these core elements. While by no means a complete list, the subjects covered will draw attention to some key topics within ICS cybersecurity and the prevailing thought process in the industry.
Risk Management
This begins with assessments of manufacturing sites to understand the current risk profile and generate a risk register. A common misconception is that you must treat each vulnerability as a potential risk to operations. Such an approach results in an unwieldy and unmanageable list of recommendations that have little or no relation to operational risk. Risk management methodology must focus on the importance of translating vulnerabilities and threats to operational risk within the plant environment.
Risk assessment methodology like cyber PHA (process hazards analysis) uses well-understood concepts within the process community such as loss of view and loss of control, exploitation of vulnerabilities and threats that are contextualized for a process engineer and described in relatable terms. Figure 2 illustrates such an approach.
[sidebar id =2]
Network Architecture
Securing an ICS network starts with developing a good as-built network architecture and segmentation rules incorporating industry best practices. Most common cybersecurity issues result from misconfigured networks and devices. So, let’s look at some prevailing gaps in network architecture and design, common misconceptions and underappreciated issues:
• Is my process control network (PCN) segmented? A control system platform typically consists of many different consoles and application software performing diverse functions commonly referred to as engineering work station (EWS), human/machine interface (HMI) server, operator clients, local historian, maintenance station and others. It’s common practice to: include EWS functionality on a client located in the field, combining local historian and HMI server in the same computer; put critical systems such as a safety instrumented system on the same subnet as the rest of the control network; and share data from the distributed control system (DCS) across the “fence line” with third-parties such as customers, equipment suppliers, etc. (The source of a recent cybersecurity breach at a pipeline infrastructure company was tracked to a compromised network at one of its suppliers.) In general, the consequence of a cybersecurity incident will differ significantly depending on which functionality/asset is compromised. Therefore, the network architecture should reflect this by segmenting assets of similar functionality/criticality into their own security zones and conduits as shown in Figure 3. A zone and conduit framework helps in breaking the overall system into smaller manageable pieces to aid in analyzing the vulnerabilities, gaps and risks within each zone and designing appropriate security control measures.
• How resilient is my PCN? Such networks have evolved over time from air-gapped proprietary ones to networks that integrate open protocols, remote access, manufacturing execution system applications, etc. The configuration and administration of a PCN typically takes place in an ad hoc manner for a variety of reasons, e.g., lack of dedicated resources, fear of interrupting operations by modifying network settings, and deadlock between IT and OT. Common configuration errors and vulnerabilities include spanning tree topology settings, misconfigured virtual local area networks and access port settings, etc. Consequently, these networks aren’t as resilient as expected and are vulnerable from an availability perspective. Even a minor change to the network such as upgrading a managed switch can result in unpredictable behavior. (Upgrading a switch with one from the same manufacturer has led to major sections of a PCN breaking down, substantially impacting operations. It took several weeks of troubleshooting to identify the root cause and mitigate the problem.) Optimizing the network performance by correcting misconfigurations and locking down security loopholes will significantly increase the availability and resiliency of the PCN. This potentially is “low hanging fruit” that provides immediate return on investment without significant capital outlay.
Figure 3. Grouping assets with similar functionality/criticality into separate security zones is good practice.
• Do we need a demilitarized zone (DMZ)? Plants rely on communication channels or conduits such as OLE for Process Control (OPC) to share data from the process control system with the long-term historian, advanced process control applications and other business applications needing these data. Most installations struggle with a standardized approach to share data securely. It’s common practice to deploy multiple protocols and data collectors to send data through a PCN firewall or a similar boundary-protection device directly to the enterprise network. The IT world calls this “punching a hole through the firewall.” Proper segmentation rules such as the Purdue control hierarchy model dictate that communications between network levels must terminate at the next higher or lower level. For instance, a data collector node in the PCN only should be able to communicate up to the DMZ network. Assets within a DMZ network serve as a buffer between the underlying PCN and the business network. A well-designed DMZ network offers the capability and flexibility to share process data across the enterprise securely while, at the same time, containing and localizing the impact to plant operations should a cybersecurity event occur.
• How secure is my DCS network? Medium/large continuous plants typically rely on a DCS to control and operate the main process. Such plants also use programmable logic controllers (PLCs) to control peripheral equipment and processes that are relatively less critical than those handled by the DCS. A network connection invariably exists between the DCS and PLC to share status and other information. Typically, DCS-related assets such as consoles, servers, controllers, networks, etc., are managed very tightly with respect to cybersecurity controls following vendor guidelines and best practices. The nature of the DCS architecture also eases centralized engineering, change management, etc., to implement and manage cybersecurity. However, the situation differs dramatically on the PLC side of the plant. Very few cybersecurity controls or policies and procedures exist, primarily due to the heterogenous nature of the PLC network. Multiple PLC brands, decentralized engineering and maintenance procedures — e.g., users plugging their engineering laptops into the network, outsourcing work to system integrators and contractors, and remote access provided to help vendors troubleshoot their equipment — make implementing consistent cybersecurity controls, policies and procedures daunting. As a result, a PLC network can serve as a backdoor and a pivot to introduce malware and denial-of-service attacks that can compromise the DCS, resulting in serious consequences to plant operations. This is a real problem at almost every plant; awareness is rising about the need to improve the cybersecurity of PLC networks as well as to secure the interface between DCS and PLC networks through additional segmentation and industrial firewalls.
Cybersecurity Testing
Similar to factory acceptance testing (FAT) of a control system, cybersecurity configuration must be tested and verified before the system leaves the factory. Once the system is commissioned, testing and making cybersecurity-related changes isn’t practical due to fear of impacting normal operation. Cyber FAT typically is performed during FAT and followed up during site acceptance testing (Cyber SAT). It involves systematically checking and validating the cybersecurity configuration against a formal specifications document or vendor published guidelines. During such testing, which is done in a controlled environment, use of aggressive scanning tools can identify and correct known vulnerabilities within the system.
Penetration testing also is a useful tool to learn more about existing vulnerabilities and how various threat vectors can exploit them. It complements risk-based assessment by taking a deeper look at critical zones and conduits that were identified during the assessment. The results from penetration testing help generate cybersecurity requirements specifications and drive standardization.
Patch Management
Applying patches issued by Microsoft for Windows-based hosts and staying current with the patch-issuing cycle is at the core of patch management. What makes this seemingly mundane task so complicated? While Microsoft tests and approves these patches, control system vendors must test and validate them for applicability to their systems. All vendor-approved patches aren’t created equal. Some patches may not be necessary depending on the applications running on a host. The same patch approved by one control system vendor may not be applicable for others. Some patches may require a system reboot. To make matters worse, there’s always a chance that one or more patches may cause a host to behave unpredictably. Historically, plants have combatted these challenges by deploying patches during controlled events such as a scheduled shutdown, maintenance turnaround, etc. However, these events are infrequent. So, process control computers significantly lag behind on patch status. With recent ransomware attacks on OT networks, a better way to stay current on vendor-approved Microsoft patches is essential. Newer products and services are being developed to meet this urgent need.
Gap/Maturity Assessments
Typically, an ICS cybersecurity program would use these types of assessments to complement a risk assessment. Gap assessments focus on providing a scorecard or similar metric to help an organization determine how it stacks up against peers or industry standards referred to as industry benchmarking. A maturity assessment goes one step further by correlating the gaps against a tier-based ranking such as the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) levels as shown in Figure 4. Combining these results with risk reduction levels derived from a risk assessment, e.g., cyber PHA, provides a consolidated view of the cybersecurity posture across multiple plants and help an organization prioritize and plan remediation projects.
Figure 4. Correlating gaps against a tier-based ranking can provide important insights.
Cybersecurity Tools
The ICS community is showing growing interest and awareness of the various technologies and solutions available for addressing the five pillars of the NIST CSF: identify, protect, detect, respond and recover. Companies now have myriad options; differentiating them often is challenging. The products are in varying stages of maturity as far as end-user adoption and their capabilities span asset inventory, anomaly detection, threat intelligence, patch management and configuration change management.
As companies begin to assess products and services, they struggle defining the appropriate scope for evaluation. For example, asset inventory can pose a challenge that may require more than one tool. While some tools focus on the Windows-level assets and network devices, others provide visibility into the control network and assets such as PLCs and smart instruments. In addition, the OT network contains several repositories of information such as control system vendor-specific tools, custom spreadsheets and others that capture bits and pieces of asset inventory data. Location information including plant site name, building, rack, cabinet, etc., also is valuable to link to each asset, so a company can identify and respond quickly to an alert or vulnerability. Automated tools can’t extract these types of data; the tool must have the flexibility to accommodate manual data entries and support importation from external data sets and customization.
Proceed Prudently
We’ve looked at some of the key issues, gaps and tools for enhancing cybersecurity. Improving cybersecurity awareness and training is critical for success. This requires catering to the disparate groups at a plant. You should distill role-based training of concepts and trends into plain easy-to-understand language and contextualize it for the process engineering, maintenance and operating environments. To serve this diverse audience, explore a variety of delivery methods including computer-based training modules.
KRISH SRIDHAR is Greenville, SC-based senior business manager — industrial cybersecurity for aeSolutions. Email him at [email protected].