Monday, 15 October 2007

Back-ups and archives

So what is the difference between a back-up and an archive? Don’t both copy data somewhere so it can be restored at a later time if necessary? The answer to the second question is sort-of “yes”, and the answer to the first question is what this blog is about.

Back-ups of data can be stored at the same location as the original or offsite. If the main site suffers a catastrophe, the data can be restored somewhere else using the offsite back-up and work can continue. Back-ups used to be performed to tapes and the tapes would be overwritten after a week or some other fairly short period of time. The data in a back-up was the same as the data left on the mainframe.

An archive is something completely different. Gartner has suggested that the amount of data in a database is typically growing by 125%. For performance reasons, no one can afford to leave unused data in a database. Unused data is data that isn’t needed operationally and won’t be referenced. It won’t be needed by a transaction. This data can be moved out of the database to an archive. The database will then be smaller, so reorgs and back-ups will take place more quickly. Using the database will require less CPU, so everything else will perform better. In addition to improved performance, organizations will enjoy reduced costs. So archiving gives a huge return on investment.

The big problem with archived data is that it needs to hang around for a long time. In fact, with new laws and regulations this could be up to 30 years! A lot can change in 30 years. Your schema on the database may change, in fact, because of takeovers, mergers, and other reasons, your brand of database may change. And there’s even a chance that you won’t have a mainframe! What you need is a future-proof storage mechanism. You also need to be able to access the data that you have in your archive. Many countries are now allowing electronic records to be used in court and those archived records need to be able to be accessed. It’s no good in 20 years time hoping that you can restore some back-ups because, even if you have the same database, you probably won’t use the same schema. You need to be able to access the data, you need to be able to retrieve the data, and you need to be able to produce reports about the data.

As well as being able to run e-discovery tools against your archive (when it comes to litigation both sides need to know what you’ve got!), you need to ensure that it is incorruptible. It’s no good finding that five years ago someone accessed the archive and hid the tracks of their previous ten years of misdeeds. The archived data has to be read-only.

And, of course, when the time comes, you have to be able to delete the data that has come to end of both its business life and its compliance life.

So archiving has much more to it than simple back-ups. It’s quite a big difference.

No comments: