Why Data Deduplication Is Important

Word Count:
440

Summary:
One of the biggest challenges to the data storage community is how to effectively store data without taking the exact same data and storing again and again in different locations on the same servers, hard drives, tape libraries etc. There have been many attempts to address these redundancie some more successful than others. There has been an attitude in the data storage cmmuninty that as we saw significant price reductins the cost of many data storage options that data storag...


Keywords:
data,deduplication,storage,san


Article Body:
One of the biggest challenges to the data storage community is how to effectively store data without taking the exact same data and storing again and again in different locations on the same servers, hard drives, tape libraries etc. There have been many attempts to address these redundancie some more successful than others. There has been an attitude in the data storage cmmuninty that as we saw significant price reductins the cost of many data storage options that data storage savings was an exercise whose time had passed. With the regulatory enviorment becming more stringent, the volume of saved data again begain to explode and more and more options began to be considered to address data storage concerns.
     
The latest answer offered by the data storage field is the technology known as data deduplication. Also known as "single-instance storage" and "intelligent compression"this advanced data storage method takes a piece of data and stores it once. It then refers to this data as often as it is asked by a pointer (or pointers) that replaces the entire string of data. These pointers then refer back to the original string of data. This is especially effective when multiple copies of the same data are being archived. The archiving of only one instance of the data is required. This reduces storage requirements and back-up times substantially.
     
If a department wide e-mail attachment,(2 megaytes in size) is distributed to 50 different email accounts and each one must be archived, then intead of saving the attachment 50 times, it is saved once with a savings of 98 megabytes of storage space for this one attachment. Multiply this over numerous departments and thousands of emails over the course of a year and the savings can be quite substantial. Recovery time objectives (RTO)improve significantly with the use of Data Deduplication reducing the 
need for back-up tape libraries.This also lowers most storage space requirements realizing significant savings in every area of hardware storage procurement 
needs.
      
Operating at the block(sometimes byte)level allows for smaller pieces of data to saved, as the unique iterations of each block or bit that has been changed are recognized and saved. Instead of having a whole file saved each time there is a change in a bit of information contained in that file, only the changed information is saved. Hash algorithms such as SHA-1 or MD5 are used to generate unique numbers for blocks of information that has changed.Most effective data deplication is used in conjunction 
with other methods data reduction delta differencing and conventional compression are two such methods. This combination can greatly reduce any errors non-redundant sytems might incur.