Shutterstock
Podcast: Lessons Learned from Boeing’s 737 MAX Crisis

Podcast: Lessons Learned from Boeing’s 737 MAX Crisis

March 5, 2024
Boeing was known for its engineering excellence until a focus on profitability took over. Properly documenting management of change and recognizing weak signals could have prevented tragedy.

In this episode of Process Safety with Trish & Traci, we discuss the aviation industry's Boeing 737 MAX crisis and how it revealed critical management failures, emphasizing the importance of rigorous management of change, prioritizing safety over profitability and identifying weak signals. The mishaps stemmed from design flaws and inadequate pilot training, highlighting the need for systemic improvements to prevent similar tragedies. Lessons extend to other industries, emphasizing proactive risk mitigation and a culture of safety.

Transcript

Background On Boeing’s Ethiopia Air Crash

Traci: The manufacturing world has looked to the airline industry on many occasions for best practices and lessons learned. One example is reliability-centered maintenance programs, which are centered on achieving the inherent safety and reliability capabilities of equipment at minimum cost. Author, Stanley Nowlan and Howard Heap prepared a reliability centered maintenance document for United Airlines in 1978. According to many maintenance professionals, this document is the Bible for maintenance and reliability professionals. There are other times that we look to the aviation industry with a more somber objective of learning from catastrophes. Today we will talk about the missteps that led to two plane crashes just months apart. The aircraft in question on both occasions, the Boeing 737 MAX on March 10th, 2019. 157 people died when this plane crashed in Ethiopia and just a few months earlier, the same model crashed, killing 189 people in Indonesia. Can you give us some background on the Ethiopia crash?

Trish:  Yeah, so this was a really interesting event that took place. As you mentioned, the Lion Air crash occurred first, and so that occurred in October of 2018. And when that occurred, there was interest of how did this happen, what went wrong? And it kind of quickly died down and everyone kept flying the MAX 8 plane. There's no issue here; nothing to see. Boeing were very clear, there was nothing to see. It was put down at the time to pilot error, all sorts of things like that. Until then, we saw the second crash of the same sort of aircraft. And again, the Ethiopian Airlines crash, that plane was only four months old. It was a brand-new plane. Now, we don't expect brand-new planes to fail. So, when that happened, the world stood back and said, "Oh, what's going on here?2 And countries all over the world started saying, actually, "No, we're grounding the MAX 8. We need to figure out what caused this. There is a problem with this."

And I remember thinking at the time, there's an interesting parallel here that we've seen this sort of thing in history before. Way, way, way back mid last century, there was an aircraft called the de Havilland Comet, and in fact, the de Havilland Comet had a catastrophic mid-air failure, a passenger plane. It was sort of like, "Oh, well, we don't really know what caused it. No, it couldn't have been the plane. It's all okay, we'll keep going." And then there was another catastrophic mid-air failure of a de Havilland Comet, and it took two major crashes to get that one grounded as well. And it was then discovered there was a design fault in the de Havilland Comet, which resulted in catastrophic hull collapse of the aircraft mid-air. I reflected at the time and thought, "Hang on, we've seen this pattern before of major aircraft crash." Same model occurred again. And I thought, "Oh, there's something happening here." So, it's an interesting one.

When we talk about the actual issue of what caused it after subsequent investigation and a lot of focus by aviation authorities all over the world and Boeing as well, it came out that the cause of the crashes was related to something they called the MCAS system, which was a system that they had installed in the aircraft, electronic system or computer-based system to as Boeing described it, help with the handling characteristics. It wasn't a flight control system, it was to help with the handling of the aircraft, and it was based on when they created the MAX 8, they put these massive engines on this 737 aircraft, but they didn't really change the main body of the 737. And so, to fit the engines under the wings, they had to move them far forward of the wing and up. So the 737 engines used to be tucked in right under the wing when the 737 was first made. The engines were now right at the front and up higher so they could fit so there was ground clearance.

Now, why should that matter? Well, that changed the entire flight characteristics of the plane. And so they needed to install a software system that would avoid a potential stall because of where the engines were moving so far forward it used would cause the nose to pitch up and that could create an unstable stall angle for the aircraft. Aircraft have these things called angle of attack sensors, which basically tell the pilot what degrees the nose is at, whether it's pointed up, neutral or down. And so, the MCAS was based on the angle of attack sensors. If the angle of attack sensor was saying that the nose was pitched up, the MCAS would automatically take over and would drive the nose of the aircraft down by adjusting the stabilizers. They're the small wing bits that stick out on the back of the tail. So it would adjust those to pitch the nose back down to a neutral position.

Now the problem is in both the Lion Air incident and the Ethiopian Air incident, there was a failure of the angle of attack sensors and they were saying the nose was pitched up, but it wasn't. And so MCAS started pointing the nose down. The angle of attack sensors were still saying no; the nose is up. So MCAS kept pointing the nose down until it put these aircraft essentially into a nosedive. And because this happened shortly after takeoff, the greatest thing a pilot has to deal with in aviation is altitude. When they're up high, they've got space to maneuver. When you're just taking off, you're close to the ground, you don't have that vertical space to figure out what's going on and stop it before you hit the ground. And so, tragically, that's what actually happened. The angle of attack sensors were inputting false results, causing the MCAS.

Importance of Training and Management of Change

Traci: This all happened, moving the engines up and out to ensure there was clearance, putting in this software system. Was there training for pilots to understand what they were walking into? When I was reading about this, from what I understand, Boeing was in competition with Airbus and Boeing wanted to bring these new planes to market quickly and they claimed that the pilots didn't need additional training. What happened there? I mean, there's importance to training and people understanding their environment.

Trish: Yeah, absolutely. And so, there was a lack of training for this. And in fact, when they went to market with the MAX 8, one of the benefits they were marketing to the airlines of the MAX 8 was there's no simulator training required. And in fact, the records have now shown that I believe it was Lion Air had requested simulator training for the MAX 8 and Boeing said, "No, no, no, you don't need it. We're not doing it. You're not getting it." Essentially, the airlines that predominantly flew the MAX 8s, they didn't want to have the time to have do these extra training bits for the pilots. So, when Boeing said, "Yeah, look, it's just an iPad e-learning thing that'll take you about an hour to familiarize yourself," because they largely kept the cockpit identical. I mean, the cockpit of a 737 still had a lot of the original dials and gauges in it. It hadn't gone to the full electronic cockpits of all of the other aircraft.

So if you were a 737 pilot, you could pretty much get in any 737 and fly this thing because the cockpit largely didn't change at all for you over the entire history of that aircraft. So that offered a lot of advantages to airlines because if they didn't have to put their pilots through extensive simulator training, that was going to get the fleet changed over much, much faster. And so to a certain extent, Boeing were also being pressured by their customers to not have such a new aircraft that there would require training. There was a big race between Boeing and Airbus, and there still is today as to who would dominate the supply of passenger aircraft. And Boeing thought that they could be faster to market if they just kept modifying and improving the range of the 737, and more importantly, the fuel efficiency of the 737 because that's also a key factor in an aircraft for the fuel efficiency. And that's what led to the different engines.

That was their focus, to bring an aircraft to market without having to design an entire new aircraft. I mean, the 737 is really, whilst it's a smaller plane for Boeing, it's the backbone of the Boeing fleet. It is the most populous plane that they have, I believe. So, it was a significant effort to get this plane out there. And they did avoid the training. They did basically just sort of say, "No, you don't need it. We're not going to provide it to you." And even in the manual for the MAX 8, the references to the MCAS were actually inconsistent. So, the information wasn't even really in the manual. So how could a pilot actually diagnose what was going on if they didn't actually even know that the MCAS existed, let alone what it did? So, all of a sudden, you are flying, and the plane takes over, has a mind of itself and starts doing something that you can't control and don't know why. That's what happened.

Traci: That had to be terrifying. And how does that happen? How can such a momentous change happen, a design change happen without any kind of documentation or somebody raising their hands and going, "We think somebody else ... Everybody needs to know about this a little bit more than you're letting on?" How does that happen?

Trish:  And it really comes down to I think a lack of the rigorous management of change that we would be used to seeing in the chemical industry, for example. They had very good designers that were designing things. They knew what they were designing. I mean, they designed great planes. These people know how to build planes, but they built or designed sort of piecemeal of you do that bit and you do that bit. And if you don't look at the overall full system, employing what we call in engineering systems thinking to look at the entire process of how one small change in one area can actually create quite a significant change in another area. The interconnectedness of what goes on needs to be understood and needs to be viewed at a high level. And so, I think there was perhaps a lack of seeing that. And again, very much this overriding pressure of getting this plane to market, ramping up production of it, they were focused on producing as many 737s as they could, ramping up production.

And you will also recall that back in January this year, there was another 737 MAX issue where an Alaska Airlines aircraft had a blowout of the plug door mid-air. That was the MAX 9. So we're still talking the same family of aircraft, and in fact, there's still some carryover issues I think in terms of how they were going about the manufacturing process and the design process that potentially contributed to all of this. It's interesting to note that there has been a report issued this week to Congress, which is focused on, it was an expert review panel that looked at what were the organizational causes within Boeing of the MAX 8. And so quite timely that this was commissioned to look at the MAX 8 issues historically and the FAA's involvement in that as well. But then it's come out just this week even. And in fact it found that there was a lot of focus on financial incentives for people were essentially focused purely on profit only with no real focus on the delivery of the aircraft.

Now we need to really think about these things. If you make a widget that can't really hurt anybody, then you can focus on the profit and getting it as lean as possible and as fast to manufacture as possible, probably without too many issues. But when your product is an aircraft that people fly in, you need to focus on the aircraft safety. It's not necessarily just about the safety of the people manufacturing the aircraft, which is still important, but you've got to focus on what keeps that aircraft in the air. What's the airworthiness of it? There were incentives in terms of the structured payment systems that really pushed for profit and somewhat above potentially that air worthiness. We saw that the engineers weren't really empowered to be able to stand up and say that they think there was a problem with something, so they were very isolated within the organizational structure. And there was certainly a fear of speaking out, which was certainly very much found.

There was confusion about what the safety culture in the organization was. It hadn't been rolled out effectively. There was certainly an issue with effectively the self-regulation that was occurring. So the FAA designated people within Boeing to be the FAA checkers basically, but those people were actually still at the mercy of the production managers.

 So, the pressure that they were under to be independent was almost impossible. We can't expect people to be able to at all times discharge their duties without fear or favor when their livelihood depends on how well it's accepted by their superiors. That's just not going to work. Sadly, I think all of those things combined is where we ended up with this idea and people saying, "We don't need to worry about the training because it's not needed. We'll scale back on that. We can save money here. We can sell an aircraft there" because the airline doesn't want to have to train their pilots any extra either. So that's going to make the aircraft more attractive. We're in this massive race with Airbus that we need to make sure we keep selling our 737s because that's one of our most profitable aircraft.

So, you can see all of these things combined within the organization to create the perfect storm for what we saw sadly. And I think Boeing has got a way ahead of them, and to their credit, they're making changes, they're doing things to try and improve this. They've agreed that they can't just ramp up the production of the MAX aircraft now. They actually need to make sure they go back and get it right before they can continue down this path.

Process Safety Lessons Learned, Finding Weak Signals

Traci: And getting it right and convincing everybody else that you got it right is a challenge in itself. What types of lessons can we learn from this in the chemical facilities? What can we take from this and apply it to our own work and try and make things better in the future for our own facilities?

Trish: Yeah, I think there's really, for me, I think three key lessons in this. The first one is management of change. You have to get your management of change right, you have to focus on it. You have to identify what a change is and make sure you understand all the risks associated with it.

The next one is to not be lulled into that area of focusing on profitability over everything else, especially when the work you do has the potential to kill people. You need to make sure you focus on that engineering excellence part, which historically Boeing was known for its engineering excellence.

And the third part is identification of the weak signals. One of my favorite topics talking about weak signals, and the reason I talk about it here is in the flight before the Lion Air crash, that aircraft was flying and the MCAS actually initiated again, or the first time, in fact, on the flight before Lion Air crashed in that very plane. And on that particular flight, there happened to be a third pilot hitching a ride sitting in the cockpit with the other two pilots. So, while the two pilots flying the plane were trying to figure out what was going on, the third pilot looked and saw the stabilizer dial moving and went, "Oh, the stabilizer's doing something, we need to stop that." It was a weak signal. They intervened, they were able to switch off whatever was going on and fly the plane.

Now it was repaired and was reported and it was said that this issue was there and it's all been sorted. Then the aircraft went for its next flight, and sadly there were only two pilots in the cockpit that day. There wasn't that third person who shouldn't have even been there to notice that this dial was moving because no one expected it to move so they weren't looking at that. They were looking for other things at the time. Finding those weak signals, and you wonder whether there could have been a different outcome in general had that been identified clearly as a weak signal and not just a false alarm that was triggering something at the time, before the first Lion Air crash, before the Ethiopian Air crash?

Traci: I know you have a full dissertation on weak signals and how to identify them. And with this topic the first time, it's an unfortunate event. The second time, it's now suspect, and you can have that 2020 vision to say, "Okay, that was a weak signal." Are they going to change the instrumentation so that two pilots, they don't have to look down, that they can see everything they need to see without having to look down? You know what I'm saying? Having that third pilot, there was the only reason they saw that.

Trish: Correct. Yeah. So the MCAS system has changed. It has been heavily modified and rewritten. The pilots also know about the MCAS system now, which was a key thing, that if you know that there's this thing that can take over then you know can turn it off. A lot has changed in that designs. I'm confident the MCAS system has no issues with it now. It has been thoroughly investigated and thoroughly redesigned. That issue has now been engineered with an engineered fix. They had to do that obviously because they had pilots refusing to fly these aircraft, let alone passengers refusing to get on these aircraft as well. So that has been rectified, but it's about how do we make sure we get it right the first time so we are not having to go back and learn from the past and tragically see hundreds of people killed in it.

Traci: And it brings that human factors into it too. So, if you see something, say something so that those types of things can change and we can get safer environments for everybody. Trish, is there anything you'd like to add on this topic?

Trish: I think there's some really interesting reading out there and I would recommend if people are interested in understanding some of the aspects of it, to take a read of the report to Congress that came out this week. I'll give you the title of it because it's not obvious what the report's about, if you don't know what you're looking for, it is Section 103, Organization, Designation, Authorizations for Transport Airlines Expert Panel Review Report. That's the report you want to Google to look for.

But it's worth finding, and it's not that long. It's only about 50 pages, so it's a short report, but it's actually a report worth taking a look at. The other thing that I found quite interesting to read was a book called Flying Blind: The 737 MAX Tragedy and The Fall of Boeing. So it's an interesting book that goes through the history of the organization and talks about some of the changes and how it happened. I'm sure that there are things in here that are not necessarily accurate if you're inside the system and you know what's going on. But certainly for the layperson, it does paint a picture for you and have a read of it and think, could we fall into that trap in our facility or our company? Is there something that we could fall into here? And what can I do to stop that?

Traci: Thank you Trish for helping us look at the entire process and the interconnectedness to make sure that we're not missing things. Unfortunate events happen all over the world, and we will be here to discuss and learn from them. Subscribe to this free podcast so you can stay on top of best practices. You can also visit us at chemicalprocessing.com for more tools and resources aimed at helping you run efficient and safe facilities. On behalf of Trish, I'm Traci, and this is Process Safety With Trish and Traci.

Trish:  Stay safe.

About the Author

Traci Purdum | Editor-in-Chief

Traci Purdum, an award-winning business journalist with extensive experience covering manufacturing and management issues, is a graduate of the Kent State University School of Journalism and Mass Communication, Kent, Ohio, and an alumnus of the Wharton Seminar for Business Journalists, Wharton School of Business, University of Pennsylvania, Philadelphia.

Sponsored Recommendations

Keys to Improving Safety in Chemical Processes (PDF)

Many facilities handle dangerous processes and products on a daily basis. Keeping everything under control demands well-trained people working with the best equipment.

Get Hands-On Training in Emerson's Interactive Plant Environment

Enhance the training experience and increase retention by training hands-on in Emerson's Interactive Plant Environment. Build skills here so you have them where and when it matters...

Rosemount™ 625IR Fixed Gas Detector (Video)

See how Rosemount™ 625IR Fixed Gas Detector helps keep workers safe with ultra-fast response times to detect hydrocarbon gases before they can create dangerous situations.

Micro Motion 4700 Coriolis Configurable Inputs and Outputs Transmitter

The Micro Motion 4700 Coriolis Transmitter offers a compact C1D1 (Zone 1) housing. Bluetooth and Smart Meter Verification are available.