"Keeping Bits Safe: How Hard Can It Be?"

David S. H. Rosenthal has published "Keeping Bits Safe: How Hard Can It Be?" in ACM Queue.

Here's an excerpt:

There is an obvious question we should be asking: how many copies in storage systems with what reliability do we need to get a given probability that the data will be recovered when we need it? This may be an obvious question to ask, but it is a surprisingly hard question to answer. Let's look at the reasons why.

To be specific, let's suppose we need to keep a petabyte for a century and have a 50 percent chance that every bit will survive undamaged. This may sound like a lot of data and a long time, but there are already data collections bigger than a petabyte that are important to keep forever. The Internet Archive is already multiple petabytes.