Use of procedures is an integral part of operating a large industrial process to achieve consistent, safe production. Industry regulations, e.g., OSHA 1910, require companies to provide written operating procedures that contain clear instructions for safely executing activities for startup and shutdown as well as normal, temporary and emergency operations. This article discusses some key challenges with procedural operations identified in an analysis of major industry incident reports by the Abnormal Situation Management (ASM) Consortium, www.asmconsortium.org, and recommended practices to mitigate the associated risks.
In the context of this article, the term "procedure" refers to a written document containing step-by-step work instructions to complete a single objective such as starting up a process unit. The common business drivers for use of procedures are to avoid safety and environment incidents, establish efficient and effective operations, and supplement employee knowledge and experience.
The ASM Consortium defines an abnormal situation as an event disturbing a process that requires the operations team to intervene to supplement the control system. This definition specifically is used to distinguish among normal, abnormal and emergency situations from the perspective of console operations. The objective of abnormal situation management is to return the process to normal before safety systems are engaged.
The consortium's focus on the use of procedures has been to examine whether enhancements to the procedure management system might enable operators to more effectively prevent or respond to abnormal situations.
PROCEDURE EXECUTION FAILURES
To better understand how to improve use of procedures, the ASM Consortium conducted a study to investigate procedure execution failure modes associated with abnormal situations [1]. A team examined root causes of failures covered in a previous analysis of 32 process industry incident reports. That earlier analysis [2] indicated that ineffective use of procedures significantly contributed to major incidents and represented 8% of all root causes.
The team assessed whether the procedural failure occurred prior to or during an abnormal situation; those arising beforehand were deemed irrelevant to procedure execution during the abnormal situation. The analysis showed that 40 of the 70 identified procedure-related root causes, i.e., 57%, were linked to procedure execution failures in abnormal situations (Table 1).
How these root causes manifest themselves provides better insight into how to make improvements in operations practices than the more generic root cause classifications [2]. Examination of the 40 identified root causes showed the most common manifestation was associated with lack of knowledge about appropriate responses to the occurrence of an abnormal situation while executing a procedure (Table 2) — followed by the failure to detect the presence of an abnormal equipment or process mode while executing a procedure, and the lack of understanding the impact or effect of performing or not performing a procedural action. In total, these three accounted for 87.5% (35 out of 40) of the procedural execution failures under abnormal situations.
Based on this analysis, the study team identified the need for effective procedure content in the following areas to improve operations' performance during abnormal situations:
• Responding appropriately to the occurrence of an abnormal situation in the execution of the procedure;
• Detecting whether equipment or the process are in abnormal mode and whether there are any latent abnormal conditions;
• Spotting excursions from normal operating range and knowing the indications of the occurrence of an abnormal situation; and
• Understanding the correct impact or effect of a procedural action and the repercussions of not following the procedural instruction.
ADDRESSING THE CHALLENGES
The analysis of common root causes and root cause manifestations suggests a need for improvements not only in the content of procedures but also in the procedure management system itself. Based on this analysis and plant experiences, the ASM Consortium member representatives identified three challenges to reduce the risk of procedure execution failure during abnormal situations:
1. Organizational culture that fails to enforce an effective policy on use of procedures. This is a symptom of a failure to establish a policy that's compatible with the pragmatics of the operations work environment. Even if a formal policy is in place, it typically just notes that employees are expected to follow procedures at all times. Often the policy doesn't clearly state whether "following procedures" means personnel may recall the procedure from memory or must have the written procedure in their hands during its execution.
Most plants adopt a pragmatic practice of letting individual operators decide how they will use a procedure document. Any given operator may be expected to know and follow dozens of procedures. The frequency, complexity and potential risks may differ quite significantly among these procedures. However, the typical policy doesn't distinguish between a simple, routine procedure with low risk, such as swapping pumps, versus a complex, non-routine plant startup procedure with high risk. Moreover, left to the discretion of the individual, whether a procedure document is used prior to or during execution can vary substantially.
One of the ASM Consortium's recommended effective practices is to establish a risk-based methodology to classify procedures in terms of usage. The methodology rates procedures based on expected frequency of use, complexity and potential consequences if the procedure isn't followed. Using the risk rating, procedures are classified into three categories:
• Critical — typically low frequency, high complexity and serious consequences;
• Reference — generally moderate frequency, complexity and consequences; and
• Guidelines — usually high frequency, low complexity and minor consequences.
From a usage perspective, the policy should state for each classification level whether the procedure requires:
• Reviewing prior to use;
• Having the full document or a checklist on hand during execution;
• Initialing each step following completion; and
• Signing-off following completion of all steps or group of steps.
Finally, the plant management team must monitor and reinforce compliance with the policy to ensure operations team members adopt the new practice. Changing organizational culture isn't simple — it just won't happen automatically with the reclassification of procedures. However, a more pragmatic approach to the organization's expectations on the use of procedures will ease the task.
2. Lack of effective methods for determining what abnormal situations procedures should address. This inherently is about understanding the risks associated with failing to execute a procedure as written and what situations might arise that might make executing the procedure no longer appropriate. Consequently, addressing the challenge requires the use of a risk assessment methodology.
The ASM Consortium's recommended practices [3] use such methodologies to:
• Establish risk-based criteria for procedural use classification (Guideline 1.5); and
• Conduct a procedure-focused process hazard analysis (PHA) as part of critical review (Guideline 1.7).
We've already discussed the role consequence plays in classifying procedures. Clearly, this requires some type of assessment of potential hazards associated with the failure to execute the procedure. Sites that have done this assessment use their usual PHA or Hazard and Operability (HAZOP) methodology. The classification results can be used to identify the specific procedures that require a procedure-focused PHA to determine the content appropriate for execution during abnormal situations, i.e., per the second guideline (Guideline 1.7).
Evaluating the risk associated with the process and operator actions during procedures can identify ways to manage and control hazards that might result from failures in execution. In performing this activity, the procedure developer must be knowledgeable about past PHA findings and aware of specific engineering controls that improper execution of the procedure might impact. Furthermore, because the original PHA findings might not have considered the procedures in the analysis of risk, the procedure developer must pinpoint risk specifically associated with procedure execution failure.
The strategy of using risk-based assessment methods for procedure development isn't a new concept to the ASM Consortium members. However, using a risk-based methodology specifically to address the procedural execution failures associated with abnormal situations is a new emphasis evolving out of the recent incident analysis study.
These best practice guidelines represent a starting point. However, a gap still seems to exist in addressing the challenge associated with the procedure development strategy for determining what abnormal situation or condition might arise that impacts continuation of the procedure.
So, to get a more comprehensive grasp of sources of risk associated with execution failures under abnormal situation management, consider:
• Failure to detect abnormal condition;
• Failure to detect abnormal situation;
• Lack of understanding of impact;
• Lack of awareness of hazard; and
• Inappropriate response to abnormal situation.
In addition, to better consider the implications of an abnormal situation, when examining potential safeguards the reviewers should determine whether an action or actions would allow the procedure to continue or whether it should be aborted.
3. Insufficient metrics for understanding the causes of procedural failures. This implies a need to enhance incident reporting to provide better information on the weaknesses of the procedure management system. Any solution must address both metric definitions and metric reporting.
Metric definitions should include both lagging and leading indicators. For instance, establish a set of lagging indicators that addresses failures in procedure scope, content or design that stymie procedure execution in abnormal situations. Likewise, create a set of leading indicators that identify failures in management system elements.
Leading indicator metrics should measure whether or not operations teams understand the plant policy on procedure use and whether personnel comply with the policy. These metrics can help address the challenges associated with "Procedure Not Used."
Incorporate the new lagging metrics into a common site reporting system that addresses all process safety incidents and promotes accurate and comprehensive reporting.
Moreover, build the leading metrics into a common site reporting system that encourages accurate and periodic reporting — e.g., use the behavioral safety protocol for process safety management interventions on procedures — not just behavioral safety (assessment, feedback and recommendations for improving; and necessary number of observations per month). This may require adapting the protocol to align with metric needs. Validate the effectiveness of the metrics in terms of reductions in the number of procedure-related incidents (per lagging indicators) as well as procedural deviation observations.
It's also crucial to establish an effective method for analyzing leading and lagging metrics over time to determine systemic failures in procedure development practices.
This suggested strategic approach to a more-comprehensive metrics-based solution for understanding the nature of procedure execution failures associated with abnormal situations requires effort to define procedure-related leading and lagging indicators and enhance the common site incident reporting system.
ACHIEVE EFFECTIVE PROCEDURES
In general, the ASM Consortium analysis reinforces the value of establishing an effective procedure management system for the development, deployment, and maintenance of procedure work instructions. The outlined approach to the analysis of incident reports can provide any organization with a good understanding of the specific ways to improve the procedural management system for better operations performance.
PETER T. BULLEMER is senior partner at Human Centered Solutions, LLC, Independence, MN. E-mail him at [email protected].
LITERATURE CITED
1. Bullemer, P.T., Kiff, L. and Tharanathan, A., "Common Procedural Execution Failure Modes During Abnormal Situations," J. of Loss Prevent. in Proc. Ind., pp. 814–818, 24 (6), 2011.
2. Bullemer, P.T. and Laberge, J.C., "Common Operations Failure Modes in the Process Industries," J. of Loss Prevent. in Proc. Ind., pp. 928–935, 23 (6), 2010.
3. Bullemer, P. T., Hajdukiewicz, J. and Burns, C., "Effective Procedural Practices: ASM Consortium Guidelines," Abnormal Situation Management Consortium, Minneapolis, MN, 2010.