Redundancy

Some failures may be expected, and indeed quantitative expectation of component failure is the basis for the rational application of 'redundancy'. Just as a human can live with only one kidney, evolution has refined the design to furnish each individual with a pair of kidneys. This decision has an evolutionary cost (growing and nourishing a kidney draws metabolic activity at the expense of other growth) but it conveys an advantage by offering a superior peak processing capability (but one that kidney donors evidently can manage without) and, more particularly, superior reliability. Spacecraft design engineers often have to trade-off the consumption of resources against the increase in reliability of incorporating two units for a given subsystem. The calculus of redundancy relies on the assumption of uncorrected failure - that systems fail at a random time, if not from a random mechanism, and that there is no strong expectation of both systems failing at the same time from the same mechanism. It is acknowledged, however, that this assumption does not always hold! This is particularly so with environment failures. Some radiation failures are random, but others relate to total dose effects, and redundancy will obviously offer little protection if both units are exposed to the same hazardous dose; and similarly, if both units are too hot or too cold, both may fail. The configuration of redundant units requires care.

Often systems may be operated in parallel without mutual interference (for example, telemetry systems can transmit the same data at different frequencies) but in other cases this is not so (for example, if two or more control systems are run in parallel they may well provide signals that conflict). The challenge with dual control systems is to identify a failure: if units have differing signals, which one is incorrect? One approach is to provide three units and use a voting logic in which the signal from one unit is rejected if it differs from the other two. Redundancy therefore imposes complexity. A notable failure of this kind of arrangement was that of Phobos 2. This spacecraft was launched by the Soviet Union in 1988 with computers that proved to be faulty. One processor expired en route and another began to malfunction intermittently soon after the spacecraft entered orbit around Mars.4 Unfortunately, the voting logic was not itself reprogrammable, and with two failed units voting 'no' the sole functioning processor was unable to make its 'yes' vote heard! On 27 March 1989, as the spacecraft was manoeuvring in the vicinity of the eponymous Martian moon, it failed to commence a planned downlink. Some fragmental signals were subsequently received which indicated that the supposedly 3-axis stabilised spacecraft was spinning in a state of uncontrolled precession, and beyond recovery. In one sense, the proximate cause was component failure but the inflexibility of the voting logic contributed to the loss of the spacecraft by permitting correlated failures to accumulate. Of course, the third processor may well have failed long before the spacecraft was able to complete its mission.

Our perception of spacecraft failures - like that of other disasters such as air crashes or earthquakes - tends not to be completely rational. Although our attention may be caught by some high-profile failures at any given moment, the fact is that broadly speaking the number of failures is decreasing, and their significance is generally becoming less serious.5 Clearly some lessons are being learned, but others are not, and we hope this book will help to improve the corporate memory of the space community.

An artist's impression of the Phobos 2 spacecraft manoeuvring in close proximity to Mars's primary moon.
0 0

Post a comment