Charles Perrow’s Normal Accidents: Living with High-Risk Technologies

Author

Jason Collins

Published

November 2, 2017

A typical story in Charles Perrow’s Normal Accidents: Living with High-Risk Technologies runs like this.

We start with a plant, airplane, ship, biology laboratory, or other setting with a lot of components (parts, procedures, operators). Then we need two or more failures among components that interact in some unexpected way. No one dreamed that when X failed, Y would also be out of order and the two failures would interact so as to both start a fire and silence the fire alarm. Furthermore, no one can figure out the interaction at the time and thus know what to do. The problem is just something that never occurred to the designers. Next time they will put in an extra alarm system and a fire suppressor, but who knows, that might just allow three more unexpected interactions among inevitable failures. This interacting tendency is a characteristic of a system, not of a part or an operator; we will call it the “interactive complexity” of the system.

For some systems that have this kind of complexity, … the accident will not spread and be serious because there is a lot of slack available, and time to spare, and other ways to get things done. But suppose the system is also “tightly coupled,” that is, processes happen very fast and can’t be turned off, the failed parts cannot be isolated from other parts, or there is no other way to keep the production going safely. Then recovery from the initial disturbance is not possible; it will spread quickly and irretrievably for at least some time. Indeed, operator action or the safety systems may make it worse, since for a time it is not known what the problem really is.

Take this example:

A commercial airplane … was flying at 35,000 feet over Iowa at night when a cabin fire broke out. It was caused by chafing on a bundle of wire. Normally this would cause nothing worse than a short between two wires whose insulations rubbed off, and there are fuses to take care of that. But it just so happened that the chafing took place where the wire bundle passed behind a coffee maker, in the service area in which the attendants have meals and drinks stored. One of the wires shorted to the coffee maker, introducing a much larger current into the system, enough to burn the material that wrapped the whole bundle of wires, burning the insulation off several of the wires. Multiple shorts occurred in the wires. This should have triggered a remote-control circuit breaker in the aft luggage compartment, where some of these wires terminated. However, the circuit breaker inexplicably did not operate, even though in subsequent tests it was found to be functional. … The wiring contained communication wiring and “accessory distribution wiring” that went to the cockpit.

As a result:

Warning lights did not come on, and no circuit breaker opened. The fire was extinguished but reignited twice during the descent and landing. Because fuel could not be dumped, an overweight (21,000 pounds), night, emergency landing was accomplished. Landing flaps and thrust reversing were unavailable, the antiskid was inoperative, and because heavy breaking was used, the brakes caught fire and subsequently failed. As a result, the aircraft overran the runway and stopped beyond the end where the passengers and crew disembarked.

As Perrow notes, there is nothing complicated in putting a coffee maker on a commercial aircraft. But in a complex interactive system, simple additions can have large consequences.

Accidents of this type in complex, tightly coupled systems are what Perrow calls a “normal accident”. When Perrow uses the word “normal”, he does not mean these accidents are expected or predictable. Many of these accidents are baffling. Rather, it is an inherent property of the system to experience an interaction of this kind from time to time.

While it is fashionable to talk of culture as a solution to organisational failures, in complex and tightly coupled systems even the best culture is not enough. There is no improvement to culture, organisation or management that will eliminate the risk. That we continue to have accidents in industries with mature processes, good management and decent incentives not to blow up suggests there might be something intrinsic about the system behind these accidents.

Perrow’s message on how we should deal with systems prone to normal accidents is that we should stop trying to fix them in ways that only make them riskier. Adding more complexity is unlikely to work. We should focus instead on reducing the potential for catastrophe when there is failure.

In some cases, Perrow argues that the potential scale of the catastrophe is such that the systems should be banned. He argues nuclear weapons and nuclear energy are both out on this count. In other systems, the benefit is such that we should continue tinkering to reduce the chance of accidents, but accept they will occur despite our best efforts.

One possible approach to complex, tightly coupled systems is to reduce the coupling, although Perrow does not dwell deeply on this. He suggests that the aviation industry has done this to an extent through measures such as corridors that exclude certain types of flights. But in most of the systems he examines, decoupling appears difficult.

Despite Perrow’s thesis being that accidents are normal in some systems, and that no organisational improvement will eliminate them, he dedicates a considerable effort to critiquing management error, production pressures and general incompetence. The book could have been half the length with a more focused approach, but it does suggest that despite the inability to eliminate normal accidents, many complex, tightly coupled systems could be made safer through better incentives, competent management and the like.

Other interesting threads: