Stop Repeating yourself – How to eliminate redundant data from backups
reprinted with permission from HP
Data deduplication is a storage technology for managing explosive data growth and providing data protection. There’s a lot of talk about the technique, but little clarity on details and practical applications.
What is it?
Data deduplication is a method for eliminating redundant data from storage, especially from backups. It works by saving a single copy of identical data, replacing any further instances with pointers back to that one copy.
Here’s a simple example: Say 500 people receive a company-wide e-mail with a 1 megabyte attachment. If each recipient saves that attachment locally, it is replicated 500 times on desktops around the network. During backup, a system without data deduplication would then store the data in that one attachment 500 times — consuming 499 MB more backup space than necessary.
Data deduplication backs up just one instance of the attachment’s data and replaces the other 499 instances with pointers back to that copy.
The technology also works at a second level: If a change is made to the original file, then data deduplication saves only the block or blocks of data actually altered. (A block is typically tiny, somewhere between 2 kilobytes and 10 KB of data.)
So let’s say the title of our 1 MB presentation is changed. Data deduplication would save only the new title, usually in a 4 KB data block, with pointers back to the first iteration of the file. Thus, only 4 KB of new back up data is retained.
When used in conjunction with other methods of data reduction, such as conventional data compression, data deduplication can cut data volume even further.
Now extrapolate that scenario beyond e-mail to thousands of gigabytes of data every month or year. That’s a lot of storage that data deduplication could help you to free up, allowing you to retain more backups for a longer time on a given amount of space. And the benefits can go even further. Data deduplication can also help:
A little myth-busting
It might seem that squeezing more data into less space would mean there’s more room to cram in new data, but that’s not how data deduplication works. Because the technology uses pointers to locate repeated data, the ratio of data you can store increases with each backup you make.
However, adding more unique data doesn’t take advantage of the space savings pointers. Therefore, the technique makes it possible to store more backups for a longer time in the same amount of space.
That means a faster recovery when you need an older version of data (as opposed to retrieving a tape from a remote site). But it doesn’t necessarily translate into freeing up room for more unique data.
Comparing technologies
When it comes to data deduplication, one size does not fit all. That’s why it is important to consider a solution’s approach from the following three levels before making a decision:
Which approach is best for my organization?
The best approach to data deduplication depends on your size and backup needs.
The importance of options
Some companies offer only one method or the other — object-level differencing or hash-based chunking. However, the two technologies offer different strengths and weaknesses in differing environments. That’s why HP now offers both options in configurations tailored to the needs of different business environments:
No matter your needs, HP puts a range of data deduplication options at your disposal, not just one that’s been scaled up or down.
With the evolution of cyber threats, up-to-date antivirus solutions are synonymous with protecting personal and…
Your website is not just something you get hosted online to claim your branding presence…
As everything is becoming more digital, so are the old ways of doing things when…
What was once science fiction mixed with reality has now become a reality that plagues…
With the increase of interconnections for communication and data sharing IP address management is necessary.…