In this episode of Distilled, we discuss managing upsets and distinguishing between shutdown-related and non-shutdown upsets. A lot has changed over the years. Indeed, the preference is shutting down during upsets and advancements like safety instrumented systems aid in automatic shutdowns. However, concerns about overreliance and security exist. Proactive measures involve maintaining situational awareness and implementing training drills. Automation of routine tasks for console operators improves efficiency and safety. Collaboration among operators, engineers and management is crucial for effectively managing upsets.
Transcript
Welcome to Chemical Processing's Distilled podcast series, Operator Training Edition.
I'm Traci Purdum, editor-in-chief of Chemical Processing, and joining me is Dave Strobhar, founder and principal human factors engineer for Beville Engineering. Dave is also the founder of the Center for Operator Performance. Hey, Dave. Thanks for joining me once again.
Dave: Well, thanks for having me back, Traci.
Defining Process Upsets
Traci: In this episode, we're going to talk about managing upsets. There are few certainties in life, but knowing at some point an upset is going to occur is one of them, so what I want to understand right up front is, how do you define and categorize upsets?
Dave: Well, I like the way you say that eventually, one's going to come because what we find in the process industries is you really have two distinct types of upsets. I have shutdown-related upsets, and then I have upsets in which I'm no longer in steady state, but I'm also not shutting down, and operators will routinely say that the latter, the non-shutdown upsets are far harder to manage than just a simple, "The unit has tripped, and now we have to shut it down." The problem with that, and what some of the major oil companies have found, is that trying to manage those non-shutdown events is very difficult because it's very chaotic. The information's constantly changing. And so, in those events, making decisions, critical decisions, becomes very difficult because I don't have all the information, and that's why these managers have said, "We don't even want to go through those upsets anymore. We want to just shut it down. Stabilize the plant, get the resources you need, make decisions when things are calmer, and bring the plant back up."
That's a major shift since I have been in the industry. Early on, operators were rewarded for not shutting down the unit, for, "Well, we're going to manage this, and we're going to hold on by our fingertips," and, "Wow, we didn't have to shut down." Now the general consensus is don't go through those, and design your plans so that you know you're going to shut down. Like you said, you know you're going to have an upset, you know you're going to shut down, and that is the critical item. I often use the example that having a plane take off is an option, landing a plane is mandatory. The same thing here, is you have to be able to safely shut down the plant, and so that really is probably the fundamental focus or concern of every major company. "Can I safely shut down my unit, and be able to be assured that that's going to happen when the inevitable..." As you said, you don't know when it's going to happen, but the inevitable upset is going to occur and force you into that situation.
Common Causes for Upsets
Traci: What are some of the common causes of the upsets? What would constitute an upset that would require that shutting down? There are some things that probably wouldn't, right?
Dave: Correct, and there's a gray area between the two. Most significant upsets are loss of utilities of some kind, so either power, steam, instrument, air. Some sort of critical resource, like hydrogen, that may be coming in from an external supplier, cooling water, so if I lose those utilities, then pretty much everybody is going to be shutting down. Within the process unit itself, again, another change in my 40 years in the business has been the use of safety instrumented systems so that now you have fault-tolerant systems that will sense when the process is getting into a dangerous area and it will just shut the process down, and it's taken it out of the hands of the operator.
When I first started a fired heater, it was up to the operator to decide, "I want to trip that heater, I want to take that heater down." And so they've taken the human out of that decision loop, hopefully, now, if they haven't bypassed the safety system, by putting these fault tolerance systems in so that we're not going to leave it up to the human, and again, in this time of things are not particularly stable to make that decision, so the safety instrumented systems are going to sense that the plant is in or approaching some sort of high-risk situation and take it out of the operator's hands, and they're just going to trip the unit, trip the plant, and initiate that sort of shutdown that the operator now has to transition from operating mode to shutdown mode.
Security Concerns with Plant Shutdowns
Traci: Are there any security issues with that? I mean, that seems a little worrisome, right? Could nefarious characters come in and shut down your plant?
Dave: Yes, and there's also probably... Less nefarious is just that there are issues that have been found in other industries. That when you put these protective layers on the system, there is a tendency to run closer to that limit. In other words, when the operator was the person going to be making the decision, I might make a... "Okay. You're going to operate normally at 490 degrees, but when you hit 500 degrees, I want you to shut down." Well, I put in these safety instrumented systems, so now I'm going to run at 499 degrees because, well, I've got that safety system there. I can run closer to that. That, of course, assumes that the safety system works, and so they tend to run closer to the limit. Probably one of the original incidents with this actually had to do with ships sailing into a port.
There were a number of collisions that were occurring, and so they put this radar in to manage the ship traffic to prevent these collisions. What happened is, of course, well, yeah, it did a good job of that, but then because it did a good job, the ships started going faster because they weren't as concerned about a collision because, "Well, I've got that radar system in there," and of course, collisions actually increased as a result of it. The concern is, and every plant has this, can somebody go in and hack the system? Then probably, the real fear, and I've seen the software that would do this, is it would lie to the operator and make him think everything's running fine, but disable that safety system and then push the process into that. There are risks associated with it, although I think, on the whole, the benefits exceed the risks that you're generating, but it's not risk-free.
Preventing Process Upsets
Traci: Now, we're talking about the occurrence of upsets and having that shut down, but are there proactive measures that can be implemented to prevent even the occurrence of upsets?
Dave: Yes. There was an event that occurred fairly recently that we've talked about before, is insuring that the operator in charge, the console operator, maintains their situation awareness so that they understand what's happening under their span of responsibility. What we have found numerous times is that what often occurs is that a console operator will get tied up with a minor problem, one of these minor upsets that we're talking about. Their focus goes all onto that and something independent happens. Because they're so focused on this little internal upset, they fail to see that another part of their plant is degrading, and by the time they realize it, they're too late. This plant recently, they lost a piece of major equipment and the console operator's focusing on that. Another area under his span of responsibility has... What began as not a significant problem.
You would say, "Well, that's no big deal. There's all sorts of ways to correct for that." Well, he didn't detect it because he had gotten this tunnel vision, and he's focusing on the upset in that particular unit. And so that little minor problem starts magnifying, magnifying, until finally they plug up a major section of the plant. Now they're down for weeks trying to get this thing unplugged, and that was because the operator lost their situation awareness. They got tunnel vision. Situational awareness is kind of the opposite of tunnel vision, it's the big picture of what's going on, and at some point, the operators just said, "Yes, this section of the plant is upset, but I'm going to leave it alone, and then rely on these shutdown systems, or whatever I have, because I've got something else that's building here, and I need to take action in that particular area," so probably the biggest way to avoid these upsets is ensuring that the person controlling the process has that big picture. They know the health of what's under their span of responsibility without getting caught up in a particular problem in a particular area.
Training to Manage Upsets
Traci: Talking about the person controlling the process, having the big picture, but what about training other plant personnel and handling emergency situations during this type of event?
Dave: Well, and again, this is another change in my career that I've seen. In other words, it was early on. You had shutdown procedures, and we would routinely walk through the shutdown procedures with the operators, and we were looking for ways to, out in the field, to enhance their response. Were there valves that were too big, or equipment that you had to climb out on a pipe rack to get to it? These were real events that were happening, and even, I was with an operator and we were having to do a lineup change, and so we're at a manifold of, I don't know, a dozen valves, and he's having to realign items. He's standing back and trying to figure out, "Okay." He knows the event. He's trying to figure out which valves he needed to manipulate to achieve the steps in the procedure, and he's kind of staring at it for a while, and I'm thinking, "We're out here at 10:00 AM on a nice sunny day."
If this was 4:00 AM in the freezing rain, what's the air probability difference between those two? And so one of the things that came out from that exercise, and a number of companies have instituted it, is what they call Red Tag Drills. And so rather than just reading through the procedure in the control room and everybody going, "Yeah, yeah. I know that, I know that. Yeah, that's there," is actually taken, literally, a red tag that you say, "Okay, boom. We are shutting down because of a momentary power failure. Go out and put this red tag on every piece of equipment you would manipulate." And so the operators go out and they attach the tags, with the idea being that all of a sudden now it becomes sort of a reality thing. Things that they didn't consider when you're just reading the procedure.
When you're out and you're going to put that tag on it, and you're going to have to walk out that pipe rack to make it happen, and you're like, "Wow. Technically, I'm supposed to have a harness and be tied off in order to do this." Again, this 4:00 AM, freezing rain. "Maybe we need to make a change here? Maybe we need to do something?" One of the big improvements that has occurred, and should be occurring, at every plant is to conduct these red tag drills for the field operators. Go out and actually touch what it is you would do, so my operator who couldn't figure out which valves to manipulate on a nice sunny day, "Go out and put this red tag on the valves that you would manipulate."
And so then the supervisor, the trainer, could come back later, and they can talk about it. "Did you put it on the right valves? Did you get every step in the procedure?" That is a huge gain for field operator performance. We've talked in the past, Tracy, about the four elements of training. Instruction, demonstration, practice, and feedback, and the most important being that practice. Well, this is practicing. Shutting down the unit is that key aspect of training the field personnel to be able to respond to these upsets.
Traci: Maybe you should add a fifth one, immersion, and have somebody out there with a big fire hose with ice-cold water blowing on them while they're trying to put these red tags out.
Dave: I think there might be some ethical concerns on that one.
Automation Systems
Traci: I agree. We talked about safety instrumented systems and being able to execute the shutdown, but are there other technologies or automated systems that can assist in detecting and responding to processing upsets more effectively?
Dave: Well, the tool is actually there. It's surprising the number of plants that... As we said, you know you're going to have an upset. You've put in these safety instruments so you know it's going to shut down, and yet when it comes to the console operator, they've done very little to make their job easier. It's just kind of like, "Well, we don't know what's going to happen, so we're just going to throw it at that person and let their training take over." It's like, "No, no, no." You know when these events are going to happen. They've happened in the past so you have data as to what happened. Don't just pretend that this has never happened before. Learn from these past events, and you can automate. Most plants that are on a distributed control system, you can implement programs for the operator that will make their life easier.
One of my first situations where I ran through this, I'm with a console operator, and again, we had this little... We were going through this power failure, power blip sort of scenario, and he's like, "Well, okay, I need to get the heat out of these towers that I'm responsible for because we don't have any more cooling, so I want to get the heat out," and he went to this display, and he says, "Well, I go to the temperature controller here, and I put it in manual, and I set it to zero. Then I go to this next display, and I put this temperature controller in manual, set it to zero," and this goes through for five different displays to make these changes, all to accomplish the task of getting the heat out of that plant.
I'm thinking, "You know, this is basically a computer system." You can put in a macro. You can put in a command that if you want the heat out of all those towers, you should just have to hit one button, and it's going to go through, and it's going to take the heat out of all those towers, so all those actions the operator was doing are easily automatable within the control systems that are there, so no special tools. It's already ready to go. It just requires somebody to identify that, "Hey, the operator's making all these actions. Let's take that away from them," because we know they're going to do it. This is not a discretionary item on the part of the operator. If I lose cooling water, I have to take the heat out. This is not a, "Oh, should I, or shouldn't I?"
You could actually tie that in together, so the minute the cooling water is lost, it could push that button for you and take the heat out, and that's kind of a decision in terms of, it's automation by consent. In other words, "Hey, I want to shut the heat out. Do you agree, yes or no," or, "No, the operator has to initiate it," so there's a sort of control issue there associated with it. But either way, having that automation available to take and perform these rote tasks that the console operator would otherwise have is an immense improvement because not only do you not have to worry about those tasks being completed. My operator that was going through these different displays, he's gone through two of them and he gets distracted and he goes off to take care of something else, and now he's forgotten he didn't finish the other three, so I've got a potential air source there that would occur.
This is also taking time and attention, that he's having to do this, so if I automate it, you hit that button, it's taking that heat out, they can spend their time assessing what is going on. Because that's what the power of the human is, right? It's our diagnostic ability, our troubleshooting, and I can't do that if I'm tied up making these sort of rote control changes. So by automating it, not only do I ensure the reliability that those tasks are going to occur, I free up mental resources on the part of the console operator to be able to assess, "Is this upset going the way I want," and so tie back into that whole situation awareness. This gives them the time to assess their span of responsibility and be able to say, "Okay. Yeah, this shutdown is going the way I would expect it to," or, "Uh-oh, something isn't right over there. I need to go investigate because it's not responding the way I want it," so there are a lot of existing tools that can be applied to make managing those upsets easier, more reliable, and safer.
Traci: What about the collaboration and getting that all in? The operators, engineers, management, making sure that they're all collaborating together to get to that point? Are there good ways for collaboration, I guess is what I'm asking?
Dave: I think the key thing here is the collaboration, in that if you have a good environment. The operators can't do this alone. They can't go in and program and make this happen, and so it's going to require the supervisors to understand, and I guess this would be the fundamental shift, is we need to manage upsets. When I first started in the industry, it was kind of like you were saying. "Hey, upsets happened," and it was just kind of this chaos. "Oh, we react," and a very reactive mode. "Well, whatever happens, we'll take care of it," and so on.
The supervision and management needs to change their focus and say, "We need to manage these upsets." Yes, they don't occur very often, but when they do, the risk potential is huge, so management needs to start this collaboration and say, "Hey, we need to manage upsets. What resources do we need to bring together to manage that," and that's where getting the operators, getting the engineers, getting the trainers together and say, "Okay. Let's analyze what's going on here, and let's see how we improve it." Because as you indicated, no one person can do this. It has to be done as a collaboration, and it has to be coming from management to say, "We're going to manage upsets." We're not going to be in this reactive mode that, "Well, yeah, it isn't going to happen very often, but I'm sure you can handle it when it does." It's like, "We know it's going to happen at some point, and when it happens, we're going to be ready for it."
Traci: Dave, do you have anything you want to add?
Bring the Plant to Stable Condition
Dave: I think this issue, as I said at the start, being able to safely bring the plant to a stable condition, that should be the fundamental goal, the number one priority in operations. For many years, for a variety of reasons, they just kind of hoped that was an achievable goal, and I think people are realizing now that no, we need to make that an explicit goal and have done due diligence to make sure, "Yeah, we can achieve it. We've walked through these events outside, and we know the field operators know what to do and can do it. We've analyzed what the console operator has to do." We know they can manage it even when you're in a very chaotic state so that shift from the reactive mode that it used to be to this proactive mode where, "We know it's going to happen, let's plan for it." I think that's been a very valuable shift, and one that not everybody has quite caught on with yet.
Traci: Well, Dave, thank you for always helping us be proactive, giving us the big picture, and getting us into the stable conditions.
Want to stay on top of operator training and performance? Remember to subscribe to this free podcast via your favorite podcast platform to learn best practices and keen insight. You can also visit us at chemicalprocessing.com for more tools and resources aimed at helping you achieve success. On behalf of Dave, I'm Tracy, and this is Chemical Processing's Distilled podcast, operator training edition. Thanks, Dave. Thanks for listening.
Dave: Thanks, Traci.