Podcast: Deadly Lessons Learned From A Permit-To-Work Failure
Transcript:
Traci: Welcome to Process Safety with Trish &Traci, the podcast that aims to share insights from past incidents to help avoid future events. I'm Traci Purdum, executive digital editor with Chemical Processing. And as always, I'm joined by Trish Kerin, the director of the IChemE Safety Centre. Hey, Trish, how are you?
Trish: Hi, Traci. I'm doing really well today. How are you doing?
Traci: Doing just fine. Getting a lot of work started. You know, the first month of the year is usually a little bit slower, but now we're getting into the groove of all the things moving and webinars and all sorts of things happening. So getting busy.
Trish: Absolutely. And I'm looking forward this year to attending face-to-face conferences again. So, that's quite an exciting future I see, hopefully.
Traci: That's been a long time coming. It's going to be strange, don't you think?
Trish: Oh, will be. I'm not quite sure how it's all going to feel, but I'll give it a go. I hope to get over to Texas in April. That's sort of Global Congress on Process Safety in April.
Traci: Well, in this episode, we are addressing permit-to-work, systems that are used to ensure that work is done safely and efficiently. Permit to work adherence is essential in process safety management. However, it goes a little bit deeper than just instructions about safety. Can you give us a more in-depth definition?
Trish: There's a lot of definitions around about what a permit to work contains. And as you know from our conversations, I like to keep things a little bit simpler, so that everybody can understand the main point. The way I like to think about a permit-to-work system is it's about controlling the ownership and activities of a particular piece of equipment at a point in time. Let's say you've got a pump in a plant somewhere, all the pump's in service, that's owned in a certain way by the operators. They operate it. They look after it. They might do some minor little preventative maintenance on it. They might top up the grease or something like that on the pump. What the permit does, is it formally transfers ownership and accountability for that piece of equipment from the operations team to the maintaining team when that maintenance needs to be done.
So the maintainers say, "I want to work on that pump," the operators say, "Okay, we'll isolate it, and we will give you documentation and show you evidence that that pump is isolated. And then we will document that we've handed control of that pump over to you right now." The maintenance can go and do their work, after they've accepted that control and understand the hazards. And then at the end of the job, they formally hand the ownership back to operations. Operations can then deisolate and restart the pump. And so, it's this clear way of making sure everybody knows who's in charge at any particular point in time with a piece of equipment so that we can keep control of all the safety aspects and all the hazards we need to think about when we are working with that equipment.
Traci: Now, what must the permits contain, and who needs to be involved? You mentioned the operator and the maintenance crew. What else needs to be in there?
Trish: So, there's some fundamental things that just have to be in a permit. There has to be the scope of the job that's being done. And the reason the scope is so important is because this is a formal handing over of control. If I'm handing over control for you to do a particular job, and you then as the maintainer decide that you need to do something else, well, I may not have prepared that for you. I haven't given control of that equipment for you to do something else. So, defining the scope and sticking to the scope is critically important because all of the hazard identification and risk assessment you'll do is all going to be based on this defined scope of what the work is. So we always need to very clearly define specifically what the task is that's being done. It might be to remove a pressure relief valve and replace it with another one, while we take this one away for testing, that might be a standard scope.
Now, if the scope of that changed and the job, and the maintainer was unable to install a new relief valve, well, that permit's no longer valid as the permit said, remove and replace. So if you are only removing and you're not replacing, then that permit is no longer valid. The scope is critically important. The status of the operating plant, the status of its isolation is also really important to be documented. How is it isolated? Is it fully isolated from the pipework electrically? Are there any other hazards in there? Is it drained? All of this information needs to be contained in a permit. Some of the standard hazards and conditions need to be noted as well. So there will be standard hazards that we have all the time in a particular part of the plant. They all need to be on the permit, so that the people doing the work, they may not be as familiar with that bit of the plant as the operators are, for example. They need to know what those hazards are, and then any specific job-related hazards also need to be there and what the controls are. What we will do to try and mitigate or prevent anything going wrong.
So it does form quite... It needs a lot of risk assessment associated with it, and these aren't big complex hazards or anything like that. But they are actually just, you know, identifying the hazards and making sure we implement controls to prevent an incident occurring during that work. And importantly, the permit has to have that authorization, that formal handover aspect. So, certain people in an organization will be trained to sign and approve a permit. Other people in an organization will be trained to sign and receive the permit. And then likewise hand it back, and then have the equipment received back into the plant. And so, there needs to be a level of confidence associated with the people that are approving permits, the people that are doing the work, and receiving the permits as well. So these are some of the really key, important aspects.
It's also important to remember that there are different sorts of permits. A permit's not just a permit. I could give you a permit to go and do some cold work that might be to remove that PSD and replace it because there's no ignition source. There's no hot work associated with that. But if you need to do some welding in the plant, for example, or grinding that's hot work, or could even just be, if you need to take some photographs using a camera potential ignition source that's hot work. So, there's a different sort of permit that would need different monitoring controls, because we need to make sure that if we have an ignition source, we don't have a fuel source waiting for the two to meet. Things like confined space permits are the other really common permits that we see making sure it's safe to go into a confined space, monitor the safety of the person in there, and safely get them out.
So that needs emergency response plans associated with it too, rescue plans. So, there's a whole lot of different permits. Excavation permits. If you want to dig a hole, you actually should have a permit to dig a hole. You want to make sure you're not going to dig through electrical cables or pipework when you do it, as well as make sure the hole is safe and it's not going to cave in. So, there's all sorts of different permits that are needed in a facility to make sure that we maintain and manage that control of the site or the equipment, and its handover, the work that's done within the scope, and then the safe handing back.
Traci: So how is this policed? You mentioned that somebody signs off on it. Is there also time constraints with it? So, if a permit is submitted, then is there a time constraint when it needs to happen? And I guess what I'm getting at is how are the checks and balances happening?
Trish: Yeah. So permits do usually have a timeframe associated with them as well. Usually, a permit shouldn't go any longer than just one shift of work, so that it's contained within the one workgroup that are all part of the handover process. Sometimes a permit may only be given for a shorter period of time. You've got four hours. At the end of four hours, we need this job completed for various reasons. And so the permit will cease at that point. Permits also actually are automatically suspended in the event of an emergency incident on the facility. So you can't just...if there's an emergency must have called, and then the all-clear is given, you can't just go straight back to work because you've got a permit. Your permit is no longer valid at that point. It needs to be revalidated because the situation, the conditions may have changed because of the emergency that occurred.
So there are various reasons why a permit may be stopped. It can be extended over multiple shifts, but you need a structured process to make sure a documented handover takes place between each shift so that everybody knows what's going on. And we know that that equipment is still being contained within a safe way. And the way that this is checked is there needs to be, first of all, a very structured permit procedure in a facility. And that needs, as I said, people to be competent and trained in it, and then people need to follow that procedure. A permit to work system is often a critical control in a major hazard facility as well. We try not to have administrative controls as critical controls when we're trying to prevent major incidents. But the fact is that a permit to work is one of those critical controls and it is administrative only.
And that means we need to monitor its application. And there's a few different ways to monitor the application. One of the most common ones I see is spot checks where you have a process that says, you know, if I've got 10 permits, then every day I will physically inspect three of those permits. And I will make sure that the permit has been completed as per the procedure. So I'll have a checklist that tells me the things I need to check on the procedure, you know, is the scope clear? Has the job continued to the scope? Or has it crept out? And so, it involves looking at the paperwork. It involves looking at the job site. Talking to the people doing the job. And you also should check permits that have been completed to make sure the close-off process works as well. So that would involve checking a permit that's completely done, and going through the checklist, and making sure that everything has been done as per the permit should.
And then you should be reporting the number of permit checks that you do because that then shows that you're checking the system. And if you're reporting the number of checks that you do, you should also report the percentage compliance to the procedure. And if you are constantly seeing that there's one part of the procedure that no one's complying with, you've got a problem you need to fix in your procedure. If no one's complying with one particular part of it, there's a problem with the procedure. If you got one person that's not complying with a particular part of the procedure, you need to understand why. It may not be the person's fault. Remember it's often not the person's fault. They do things for very good reasons. We need to understand the reason they do things and resolve that issue because we have these systems in place to protect the safety of the facility, and importantly, the people interacting with that facility.
So we produced a guidance document in the Safety Center several years ago now on lead process safety metrics. And we talk about how to calculate our permit check to schedule. And we also talk about compliance to a procedure for permit checks as two of the metrics in that document. And late last year, we released a supplementary guidance document that actually looked at more in-depth monitoring and assurance of permit to work systems. So we've got this guidance document on permit-to-work systems and how to check and monitor that process.
So there's some really good information in there to talk to you about what's in there. The other thing that happened this month is this month's process Safety Beacon from the Center for Chemical Process Safety. The CCPS actually is all about permits. So get yourself a copy of that and have a read of it. They talk about things like creep on the scope and all that sort of stuff. So, there's a lot of great information out there on what should be in a permit, and how to monitor the system, and how to make sure it's working for you.
Traci: And I will offer a link in the transcript of this so that folks can get to that easily. What is the purpose of permit to work aside from the safety aspect?
Trish: Aside from the safety aspect, it also helps you understand what equipment is available for you at any point in time. So it can have some quite significant production implications for you. So if you've got a pump out of service for maintenance, and so there's a permit that's got that pump isolated, and you've got a spare pump that you're currently running because it's a spared system. If your spare pump breaks down, then having your permit to work system clearly documented will help you understand whether you can restart your other pump. Is there something wrong with it that it can't be restarted, or can it be put back into service at this point in time? So permit-to-work systems actually help make it clear for everybody what status your equipment is in, so that you know how you can work with the equipment you've got to make sure you can continue to maintain production, obviously, in a safe way. Its fundamental purpose is safety, but there are other benefits to permit systems as well.
Traci: Now let's talk about some failures, some permit to work failures, and how failures can be avoided. The big one that comes to mind is the Piper Alpha as an obvious example.
Trish: Yes. So, Piper Alpha is probably one of the most prominent permit-to-work failures that we've seen in the history of process safety. One hundred and sixty-seven men perished on the Piper Alpha oil rig when it sank to the bottom of the North Sea. And that was a clear permit to work failure. So what was happening on that particular day? They were doing some maintenance work where they were removing and replacing pressure relief valves on the rig. And they had removed the pressure relief valve on a particular pump, with the full intention that they were going to replace it after it had been tested and made sure it was in good working order. But the testing of it got a little bit delayed because they couldn't get hold of the quality assurance person to verify the test to make sure that the relief valve was safe to go back into service.
So what the fitter did was instead of fitting a blank flange, they fitted a blank flange, but they actually only finger tightened the studs on the flange. So it was literally an open piece of pipework coming out of this pump. Then a shift change occurred, the PSC hadn't been put back in place, and the other pump failed. So they needed to put the first pump back in service. There wasn't a clear handover between the two shifts as to the status of the pump and the fact that it was actually racked out electrically. It was isolated, racked out, and not in service. But the status of handover was not clear that there was no pressure relief valve on it. And in fact, it wasn't even a secured pipe.
They did rack the pump back in and turned it on because they needed the pump in service because of the issue with the other pump. That created a gas leak, that gas leak then exploded. And it took out the control system for the platform. It took out the control room. It took out fire pumps and created a massive fire that burned for several hours. The crew were unclear as to what to do, so most of them sheltered in place in the cafeteria, which is where they were meant to go to shelter in place. No one took clear leadership for an evacuation. There were several survivors of Piper Alpha. And the interesting part of the story is all the survivors actually jumped off the burning platform into the burning sea and were rescued by support vessels that came to assist.
So those that survived didn't follow what the emergency response plan said. Those that followed it actually died in this instance because the emergency response plan was very, very inadequate for what was going on. But the big issue here was that there was a systemic failure in the organization to have an adequate permit to work system that clearly defined ownership of the equipment, status of the equipment, and handover, in particular, between shifts when the permits stretch over one shift and into another. There was also obviously issues around permit scope. I said earlier if you're going to remove a relief valve and the permit says, remove and replace, then you remove and you replace. You don't remove and blank flange finger tight only. That's a scope creep in a permit, which is not following what the scope actually says. So, Piper Alpha is one of these examples that we learned a lot more about a permit-to-work system and about what should be done, and how it should be done.
And so it's certainly a case study that's worth reading or worth getting hold of to see just what went wrong in the permit system and how it was applied. There were many other issues in Piper Alpha. It had changed from being a oil platform to a gas and oil platform. And it had fire-rated walls, but not blast-rated walls in it, and those sorts of things. So when the blast occurred, it took out parts of the control center, as I said. And so, there was those sorts of issues associated with it. Other issues were that it was also connected to a series of other platforms in the North Sea. And it was the hub where the other platforms pumped their product into and Piper Alpha pumped at the shore. When all the other platforms lost communication with Piper Alpha, and they could see that there was a fire on the horizon, nobody had authority to stop pumping oil and gas to them. So oil and gas continued to be pumped to the burning platform.
Now, the amount of oil and gas that was already contained within the pipelines had to pressure somewhere. So it probably wouldn't have made that much of a difference perhaps, the fact that they kept pumping. There was enough oil and gas that was already going to get to Piper Alpha for its destruction, but it says a lot about, you know, making sure we do have this ability to stop work when we think something is unsafe, rather than literally looking at a burning platform on the horizon and not having the authority to stop pumping oil or gas to it.
Traci: Is there anything else that we need to know about this topic?
Trish: I think the main message I'd like to leave you all with is that permit to works are absolutely a critical safety element in your facility. And you don't have to be operating a major hazard facility to have a permit to work system. Obviously, if you are operating an oil refinery or a chemical plant, your system might be quite complex. If you're operating a small production plant, a small batch plant, a simple permit system is something that you actually should have, whether you are covered under OSHA 1910 or not. It's good practice. And it's a really good safety system to make sure you have control of what's going on in your plant.
As we said, it's predominantly safety, but also helps you understand the status of equipment in your plant. So there is some production and operations benefits to them too. So if you don't have a good permit to work system in your facility, you need to implement one. You need to make sure you've got the isolations in place so that people can safely interact with machinery without the risk of the machinery killing them in some way. But you also need to make sure that you can safely interact with the machinery without you interfering with it in a way that's going to cause an incident, a fire, an explosion, something like that. So, permit to works, they're absolutely critical. They have to be done no matter the scale of your business. This is about keeping people safe and also about understanding the status of your plant.
Traci: Well, as always, Trish, you help us understand what's out there for us to live safer, work safer. And I appreciate your insight on this. Unfortunate events happen all over the world, and we will be here to discuss and learn from them. Subscribe to this free podcast, so you can stay on top of best practices. On behalf of Trish, I'm Traci. And this is "Process Safety With Trish & Traci."
Trish: Stay safe.