How do you ensure safety in a cutting-edge R&D project? Below are some notes from a paper on the Columbia and Challenger disasters. My high-level impressions are:
- In both cases, the engineers knew that key systems aren’t safe and tried to report their concerns to the management who decided to ignore them and go ahead with launches.
- Competitive pressures with the Soviets, and schedule pressures from the Congress, pushed for fast progress and sloppiness.
Some rough notes below:
NASA’s safety practices were reviewed in the aftermath of the Columbia disaster
The board arrived at some far‐reaching conclusions. According to the CAIB, NASA did not have in place effective checks and balances between technical and managerial priorities, did not have an independent safety program, and had not demonstrated the characteristics of a learning organization. The board found that the very same factors that had caused the Challenger disaster 17 years earlier, on January 28, 1986, were at work in the Columbia tragedy (Rogers Commission 1986)
Richard Feynman said of NASA:
The argument that the same risk was flown before without failure is often accepted as an argument for the safety of accepting it again. Because of this, obvious weaknesses are accepted again, sometimes without a sufficiently serious attempt to remedy them, or to delay a flight because of their continued presence.
It appears that information about risk was present in the relevant groups. However, they were ignored by the decision makers:
The Rogers Commission had criticized NASA’s decision‐making system, which “did not flag rising doubts” among the workforce with regard to the safety of the shuttle. On the eve of the Challenger launch, engineers at Thiokol (the makers of the O‐rings) suggested that cold temperatures could undermine the effectiveness of the O‐rings. After several rounds of discussion, NASA management decided to proceed with the launch. Similar doubts were raised and dismissed before Columbia’s fateful return flight. Several engineers alerted NASA management to the possibility of serious damage to the thermal protection system (after watching launch videos and photographs). After several rounds of consultation, it was decided not to pursue further investigations (such as photographing the shuttle in space). Such an investigation, the CAIB report asserts, could have initiated a life‐saving operation
Schedule pressure has been identified as a key driver of risk:
Both commissions were deeply critical of NASA’s safety culture. The Rogers Commission noted that NASA had “lost” its safety program; the CAIB speaks of “a broken safety culture.” In her seminal analysis of the Challenger disaster, Diane Vaughan (1996) identified NASA’s susceptibility to “schedule pressure” as a factor that induced NASA to overlook or downplay safety concerns. In the case of Columbia, the CAIB observed that the launch date was tightly coupled to the completion schedule of the International Space Station. NASA had to meet these deadlines, the CAIB argues, because failure to do so would undercut its legitimacy (and funding).