How do you ensure safety in a cutting-edge R&D project? Below are some notes from a paper on the Columbia and Challenger disasters. My high-level impressions are:
- In both cases, the engineers knew that key systems aren’t safe and tried to report their concerns to the management. Despite having adequate information, the management ignored them and went ahead with launches.
- Competitive pressures with the Soviets, and schedule pressures from the Congress, pushed for fast progress and sloppiness.
- It was pointed out to me that Edward Tufte did some interesting work analyzing how information was presented to decision makers, and how this affected the disaster.
Some rough notes below:
NASA’s safety practices were reviewed in the aftermath of the Columbia disaster
The board arrived at some far‐reaching conclusions. According to the CAIB, NASA did not have in place effective checks and balances between technical and managerial priorities, did not have an independent safety program, and had not demonstrated the characteristics of a learning organization. The board found that the very same factors that had caused the Challenger disaster 17 years earlier, on January 28, 1986, were at work in the Columbia tragedy (Rogers Commission 1986)
Richard Feynman said of NASA:
The argument that the same risk was flown before without failure is often accepted as an argument for the safety of accepting it again. Because of this, obvious weaknesses are accepted again, sometimes without a sufficiently serious attempt to remedy them, or to delay a flight because of their continued presence.
It appears that information about risk was present in the relevant groups. However, they were ignored by the decision makers:
The Rogers Commission had criticized NASA’s decision‐making system, which “did not flag rising doubts” among the workforce with regard to the safety of the shuttle. On the eve of the Challenger launch, engineers at Thiokol (the makers of the O‐rings) suggested that cold temperatures could undermine the effectiveness of the O‐rings. After several rounds of discussion, NASA management decided to proceed with the launch. Similar doubts were raised and dismissed before Columbia’s fateful return flight. Several engineers alerted NASA management to the possibility of serious damage to the thermal protection system (after watching launch videos and photographs). After several rounds of consultation, it was decided not to pursue further investigations (such as photographing the shuttle in space). Such an investigation, the CAIB report asserts, could have initiated a life‐saving operation
Schedule pressure has been identified as a key driver of risk:
Both commissions were deeply critical of NASA’s safety culture. The Rogers Commission noted that NASA had “lost” its safety program; the CAIB speaks of “a broken safety culture.” In her seminal analysis of the Challenger disaster, Diane Vaughan (1996) identified NASA’s susceptibility to “schedule pressure” as a factor that induced NASA to overlook or downplay safety concerns. In the case of Columbia, the CAIB observed that the launch date was tightly coupled to the completion schedule of the International Space Station. NASA had to meet these deadlines, the CAIB argues, because failure to do so would undercut its legitimacy (and funding).
Further context from Wikipedia:
According to Ebeling, a second conference call was scheduled with only NASA & Thiokol management, excluding the engineers. For reasons that are unclear, Thiokol management disregarded its own engineers’ warnings and now recommended that the launch proceed as scheduled; NASA did not ask why. Ebeling told his wife that night that Challenger would blow up.