Today’s process control and safety systems provide massive amounts of valuable data that significantly enhance a plant’s ability to manage operations and troubleshoot problems. The systems have fostered impressive gains in productivity and safety. These improvements stem to a certain degree from alarms that warn of variables deviating from appropriate ranges, enabling operators to take corrective actions. However, the ease and low cost of adding alarms too often lead to almost unchecked growth in their number, which, in turn, causes alarm “floods” that tax an operator’s ability to identify and thus respond correctly to the key underlying issue. Some experts regard this as a problem of operators drowning in an ocean of data.
Dealing with this problem demands an effective alarm management (AM) system. Briefly put, such a system should provide an operator with actionable information and guidance for corrective action in a timely manner and should accommodate organizational and process changes over the lifespan of the alarm.
A number of standards — ISA-18.2 (IEC 62682) and EEMUA-191, API RP 1167 are the most frequently used — provide guidelines for implementing an effective AM system. These standards use a system’s approach and consider the entire alarm lifecycle.
Key Considerations
These standards offer an excellent roadmap for developing your AM plan. When working up your plan, keep the following key points in mind:
• Operators are the primary recipients of alarms; therefore, their input in developing an AM system is vital. In addition, engineers and maintenance personnel as well as safety and environmental professionals are important stakeholders.
• AM is not a one-time effort. It lasts for the entire lifecycle of the alarm system. The management system should be capable of handling changes in personnel, procedures and technology. As a corollary to this, well-designed operator training and change management will help ensure success of the AM system.
• Alarm system displays should be easy to grasp. Appropriate groupings of alarms, e.g., safety-critical alarms for an area or a piece of equipment or alarms related to environmental or other regulatory compliance, may enhance understanding.
• From a functional viewpoint, the alarm system should be robust (say, fault tolerant) and provide reliable and timely information so operators can take corrective action with confidence. Alarms representing scenarios with high consequence should be clearly visible.
• The importance of alarm records can’t be overstated. Today, alarm system records generally are part of larger corporate systems. So, consider the alarms and AM in the broader context of data and information management for the organization.
• Evaluate the impact of power failure on alarm availability in the AM system.
• Some AM systems may require extensive development efforts. So, to augment in-house expertise, you may need to seek help from control and safety system vendors, consultants and database management professionals.
• Streamline the AM system. Where possible, avoid paperwork and bureaucracy. Some bureaucracy is necessary — e.g., to guard against uncontrolled modifications by imposing strict management-of-change (MOC) procedures — but strike a balance between bureaucracy and efficiency.
Gap Analysis
For existing systems, it’s important to determine how they compare to recommended guidelines from the standards. In the gap analysis, you can observe the alarm system for a representative period or do an offline analysis. Such an analysis often uses 10-hr segments. Consider:
• Alarm documentation and procedures for AM and their last updates;
• Alarm displays and whether operators easily understand them;
• Suppressed alarms and why they are suppressed or disabled;
• Frequency of alarms (average and maximum number of alarms in a 10-min period); and
• Duration of alarms.
Then, take a number of actions:
• Identify “bad actors” — e.g., alarms that keep going on and off (chatter), and ones that stay on too long (say, hours) — as well as stale and nuisance alarms.
• Determine the percentage of time that alarm rates fall outside acceptable limits. Standards note that an average of one alarm in a 10-min period is acceptable while an average of two alarms in a 10-min period, though manageable, would tend to stress out the operators.
• Pinpoint those alarms required for safe, efficient and regulatory-compliant operation. Operators, engineers and safety/environmental professionals should provide key inputs.
• Assign a priority — high, medium or low — to every alarm based on the consequence of a mishap associated with the alarm scenario and the response time available. Give alarms with high consequence and low response time high priority, and those with low consequence and longer response time low priority. For the system as a whole, the standards recommend the following distribution of alarms: high priority, ~5%; medium, ~15%; and low, ~80%.
• Find the causes of any high-frequency alarms; don’t eliminate any such alarms without careful analysis of their underlying causes. Potential culprits include: poor controller tuning, incorrect installation of sensors/transmitters, improper setting for the alarms (too close to the normal operating range), faulty grounding and inadequate deadband in alarming. Removing the causes can reduce some frequencies considerably.
• Adopt the standards’ suggested alarm frequency goal (an average of one alarm per 10-min period), and follow their guidelines for alarm delays:
— For any on/off delay, consider the impact on the process. Obviously, safety, productivity and compliance are the key criteria.
— For flow and pressure, aim for approximately 15 sec.
— For level and temperature, ~60 sec is acceptable.
— For analyzers, review your process and consider how an alarm delay can affect safety, quality and compliance.
System Development
Developing an AM system is a team-based activity that requires technical know-how as well as diplomacy in dealing with diverse groups of stakeholders. For new systems, follow the guidelines and requirements given by ISA-18.2, IEC 62682 or EEMUA-161:
Alarm philosophy. This is the umbrella document that specifies the processes to be used for each lifecycle stage (discussed below). The focus is to ensure operational or working definitions exist for, e.g., alarm priorities, settings, performance metrics (such as frequencies), design of alarm displays (human/machine interface (HMI)) and MOC. A companion document called the alarm system requirement specification goes into greater detail on specifications.
Identification. Determine the alarms needed for safety, regulatory compliance and smooth plant operation. Some alarms also could be dictated by other activities such as hazard and operability studies, and process and instrumentation drawing reviews. The key questions to think about are: “Do I really need this alarm? What do I lose if this alarm is not there?”
Rationalization. Review each alarm and develop supporting documentation such as the basis for the alarm set point, corrective action necessary, consequence of inaction, alarm priority and alarm organization. Rationalization likely will enable elimination of many unnecessary or nuisance alarms. You possibly may find a need to add some other alarms. Results of rationalization typically are captured in a document called the master alarm database.
Detailed design. Broadly put, you must address three major areas in this stage: specifics of alarms (set point, deadband, associated control systems, etc.); particulars of the HMI; and advanced alarming, the need for which will depend upon your process.
Implementation. It’s not uncommon to find that many alarms don’t perform as designed because of poor installation. This stage of the AM lifecycle involves logical and physical installation — including location of the alarm as well as its testing and commissioning. Operator training also takes place during this step; it should focus on what the operator must know about the alarm and how to respond properly.
Operation. This is the stage in which the alarm system is functioning. You may consider refresher training for the stakeholders.
Maintenance. Periodic repairs and testing are part of the maintenance stage of the lifecycle. Lack of appropriate procedures could lead to alarms that end up shutting down the plant. The key is communication among the parties involved.
Monitoring and assessment. You regularly must check the performance of each alarm and the whole AM system, and compare performance metrics with the ISA-18.2 (IEC 62682; EEMUA 191) guidelines. Periodic reviews of the results will help you initiate appropriate troubleshooting.
Management of change. From time to time, you may need to add or remove alarms, modify their set points, deadbands or other parameters, or alter displays. Unless these changes are properly controlled and documented, the AM system will deteriorate. You must review a proposed change from the standpoint of each of the lifecycle stages.
Audits. Their purposes include, for instance, identifying deficiencies in the AM system against the alarm philosophy and potential areas of improvement. Audits are more comprehensive than periodic monitoring and assessment. Audits involve, e.g., management commitment, AM practices, comparison of performance indicators against the standards, MOC, operator’s ability to respond to alarms, and training and documentation.
AM is a tool to enhance safety, productivity and regulatory compliance in a quantifiable manner. It is a multi-discipline activity. Teamwork and vigilance are the key to its success.
GC SHAH is a senior advisor at Wood Group, Houston. E-mail him at [email protected].