What Are Bit Flips And How Are Spacecraft Protected From Them?

Table of Contents (click to expand)

Cosmic rays from space can cause data stored in computers to get bit-flipped. Fortunately, we have developed clever techniques to rectify this issue.

We’re all familiar with computers and their usefulness in our daily lives. We use them for many things, from entertainment and gaming to finances, accounting, and even performing complicated mathematical equations that can determine how galaxies form and model various biological systems.

Computers have integrated deeply into our lives, as our smartphones are essentially mini-computers.

But are computers 100% perfect?

Not really, right? Computers can crash, catch (computer) viruses, and be compromised in many different ways, such as with bloatware or ransomware.

Even the natural world has a way of messing with computers. In this article, we’ll explore how this happens: what are bit flips and how cosmic rays cause them, potentially leading to critical errors inside a computer. We will also briefly see how specialized computers employed inside spacecraft protect themselves from cosmic rays and prevent bit flips.


Recommended Video for you:



About Bit Flips

A pictorial representation of what happens when a bit flip takes place. (Credits: Jens Vankeirsbilck/Researchgate)

Bit flips are a type of unintentional changing of memory data. Computers store data in the form of bits as 0s and 1s. When a piece of data gets ‘bit-flipped,’ the value of this memory data changes or flips: a 0 becomes 1, and a 1 becomes 0.

This bit flip happens when a high-energy charged particle strikes the memory hardware. These particles could be an alpha particle or a cosmic ray originating from space. When such particles strike the memory hardware, they alter the properties of the electron used to store the data, causing the bit to flip.

Bit flips fall under the category of ‘soft errors’. When a soft error occurs, we can make the required rectification using codes to rewrite the bit value at the place where the fault occurred and get its correct value back. This is different from a hard error, which is usually the result of faulty or damaged hardware. When a hard error happens, the hardware itself needs to change.

As mentioned, cosmic rays are one of the reasons that bit flips occur in memory devices.

How Do Cosmic Rays Cause Bit Flips?

Cosmic rays are high-energy particles originating from outer space. They mainly consist of protons, along with small amounts of Helium nuclei and trace amounts of other kinds of heavier nuclei and quantum particles.

This is a general representation of what happens when a cosmic ray enters the Earth’s atmosphere. This conversion of cosmic rays into pions and muons has been referred to as the ‘cosmic ray cascade.’ (Credits: Theturnipmaster/Wikimedia Commons)

When these cosmic rays reach the upper layers of Earth’s atmosphere, they collide with the nuclei of the particles in the atmosphere. After that, the cosmic ray particles mainly convert into pions, which further decay into muons. Muons do not interact much with matter and effortlessly reach the surface of the Earth.

RAM and flash memory store data using transistors as one of their key components. These modern memory devices use metal-oxide semiconductor field-effect transistors or MOSFETs. The memory storage, in the form of bits, is done by applying voltage values across the transistor terminals.

A bit flip occurs when an external charged particle, like a cosmic ray, interacts with the MOSFET and alters the properties of the electron flowing through it and, by extension, the voltage value across the transistor terminal.

Here is an illustration of the two types of metal-oxide field-effect transistor (MOSFET). These transistor types are extensively used in memory storage devices. (Credits: Fouad A. Saad/Shutterstock)

Computers on the surface of the Earth are predominantly safe from cosmic rays, since most of them usually end up as muons by the time they reach the Earth’s surface. Therefore, computers on the ground don’t usually get bit-flipped, but this is not the case with spacecraft traveling in outer space. They are bombarded with cosmic rays, without the disruptive effect of Earth’s atmosphere, making them quite vulnerable to bit-flips.

Rectifying Bit Flips

While avoiding cosmic rays can be somewhat impossible for spacecraft once they’ve left Earth’s atmosphere, there are other things we can do to correct a bit flip once it has occurred. Sometimes, rebooting can indirectly clear up the bit-flipped data, resetting it to its original value via memory refresh and reinitialization. However, this technique does not always work, and more robust techniques may be needed. 

Sometimes, we use error-correction codes (ECCs) to fix the errors created by bit flips. The codes can detect when a bit flip has taken place, which is usually done by determining the number of 0s or 1s the data contains (provided by the user). If the software detects that there is a mismatch in the number of 0s or 1s from what it received and what the user has provided, it detects the error.

SEUBitFlip
This diagram shows how a cosmic ray might strike a MOSFET in order to cause a bit-flip.

More sophisticated ECCs, like Hamming codes, are also used to rectify the errors caused by bit flips.

Another way that bit flips are handled and amended is by a technique called modular redundancy. Here, the correction is done by repeating the process from where we obtained the data and then conducting a majority vote.

For example, if the data obtained is ‘1,’ then by repeating it three times, we should get ‘111.’ However, suppose a bit-flip occurred, and the data obtained is ‘110’ instead. Since the ‘1’ is still the majority, modular redundancy would tell us that ‘1’ is the correct data for the bit.

Modular redundancy that uses three repetitions is called 3-way modular redundancy or triple modular redundancy.

The computers used in the Shuttle program used five repetitions and were called 5-way modular redundancy. While effective, modular redundancy requires significant mass and power, making its implementation difficult.

This is a schematic representation of how 3-way modular redundancy works. Here, three inputs are fed (as a result of three repetitions) and the majority value is picked by the ‘voter.’ (Credits: Arslan Ahmed Amin/Sage Journals)

A Final Word

Outer space is harsh on computers. Just as the atmosphere protects plants and animals on Earth from dangerous rays coming from outer space, it also protects computers and other electronic instruments. However, with the number of space missions increasing and with significant ones like the Mars mission on the books, cosmic ray bit-flipping is an area that must be accounted for very seriously. Outer space is not somewhere you want computers to develop errors and crash at unexpected times.

Astronaut Chris Hadfield using a computer while aboard the International Space Station. (Credits: The U.S. National Archives)

The good news is that we have found some clever ways to overcome these bit flip issues. Of course, errors in satellites, spacecraft, space telescopes and space stations due to bit-flips do not make any prominent headlines, so while they inevitably happen, we can correct them using appropriate techniques. This way, we can be sure that our space missions are safe, while respecting the undeniable fact that outer space is quite unforgiving to nearly everything!

References (click to expand)
  1. Single Event Effects.
  2. Suggested Searches - ti.arc.nasa.gov
  3. Cosmic rays: particles from outer space.
  4. Soft Memory Errors and Their Effect on Sun Fire Systems.
  5. How to prevent our computers from crashing in outer space?.