Bangkok--3 Jul--Core & Peak
With all the uncertainty surrounding the world economy right now, organizations under even more pressure to look at ways to reduce their CAPEX and OPEX.
Deduplication is undoubtedly one of the hottest topics in the storage industry at the moment, not just because it’s the next big thing but if implemented correctly, has the opportunity to usher in a new paradigm in cost savings.
Adrian De Luca, Director, Storage Management & Data Protection, Asia Pacific, of Hitachi Data Systems explains.
Redundant Data Reduces Storage Efficiency
Data duplication is a primary contributor to the explosive growth in data. For every file stored in your file servers you could you could have as many as ten or twenty copies in your storage environment! Does this sound ridiculous?
Let’s take a closer look at your storage infrastructure—consider the example of a PowerPoint presentation attached to an e-mail. If the e-mail is sent to 50 recipients in your organization, this creates multiple redundant copies of the same original file. For Data Protection you create a snapshot to protect yourself from file corruption or local failure (sometimes as many as three copies for faster recovery point objectives), a remote copy to protect against a site failure, possibly another copy in your third data centre if you are an insurance company, Telco or bank. You also back up your data every night with five differentials during the week and a full backup on weekends, and you keep twelve months of full backups in your tape library. This configuration has 900 copies alone!
The fact is that as the amount of primary data within organizations continues to grow, the amount of repeated data takes a dramatic toll on available storage infrastructure. Redundant data quickly consumes storage resources at an alarming rate, driving up business costs though increased storage purchases and management. Before you know it, storage consumption goes exponential and becomes extremely difficult to control.
Does this sound like your environment?
Data deduplication has been around for some time, however more recently the technology has become more prominent in helping combat the problem of repetitious data in the secondary storage environment. Deduplication ensures that only “unique” data is written during the backup process, which means that significantly less disk capacity is needed on the back-end to store changes. This offers a number of important business benefits such as reduced CAPEX and OPEX costs, smaller backup windows and faster and more reliable recovery times.
By leveraging the advanced capabilities of data deduplication technology, the PowerPoint in the initial example can stored only once, regardless of the number copies that are demanded. This is an example of data deduplication technology at work at the file level. Next, consider what happens when one of the e-mail recipients modifies a slide in the presentation and again forwards it to a group of colleagues. Advanced data deduplication algorithms can be used to store only the data associated with the changed slides.
Initially, deduplication focused on eliminating data redundancy in specific cases like full backups, e-mail attachments, and VMware OS images. Over time, however, customers have noticed the pervasiveness of duplicated data. Given that impact on the bottom line, organizations are recognizing that, far from being a niche technology, deduplication needs to become an integrated and mandatory element in their overall IT strategy.
Common Implementation Challenges of Data Deduplication
Deduplication can revolutionize the way data is stored and protected. While the technology can deliver the required data management benefits without negatively impacting the bottom line, not all deduplication products deliver a complete solution. Some approaches can result in insufficient data reduction (therefore increasing costs), performance bottlenecks, and management complexity creating unwanted vendor lock-in.
Some common deduplication implementations are:
- Insufficient data reduction—Whilst there is an expectation that the amount of storage will be reduced with dedupe solutions, the result in the particular environment will vary depending on the technology you choose and the type of data you are looking to deduplicate. Marketing brochures that claim large dedupe ratio’s are simply an indication of what is possible in the best case scenario and may not be what is achievable in your environment, therefore it is important to quantify. Many vendors offer estimation tools or tests that can be applied to your data to give you a more realistic expectation of what you will get.
- Performance bottlenecks: Finding and eliminating redundant data is an intensive task and should be sized for the amount of data you have today and expect to grow. If sized incorrectly, what looks like a cheap solution with appliance-based offerings may turn out to be expensive when you need multiple boxes to do the job. Without contextual knowledge of the data that it deduplicates, some solutions may face significant challenges scaling to the size of most enterprises.
- Increased complexity of management: Many deduplication solutions today behave as if the entire workflow revolves around their value. To reap the benefits of network optimization, an organization needs to either install new hardware or software in its remote offices. In some cases this may not be practical from a cost or management perspective so a proper architecture design that takes in all objectives should be done early in the process.
- Islands of Deduplication: Proprietary solutions create vendor lock-in, combining limited performance with proprietary storage layouts. They make it nearly impossible to move data from a deduplication appliance to other storage. Also duplicate data extends across numerous storage tiers, including data replicas, archives, and test-and-development copies. Too often, deduplication solutions address only one of those areas. As a result, you end up limiting future opportunities to further reduce your storage consumption.
Therefore it is important that a holistic approach is taken with an understanding of where all duplication resides, which areas should be attacked first and what savings this will yield. Only after this assessment can a technology decision be made.
A Revolutionary Approach
Traditional deduplication products in the market, typically in the form of appliances sometimes fail to address the mentioned challenges as it lacks a comprehensive approach. These solutions may not address all the different data sets, may be difficult to integrate into your existing environment or may prove resource intensive to manage once implemented. This leads to a number of inefficiencies that may result in further challenges down the track and make the project look like a complete failure to management.
This is why it is important to understand the new generation of deduplication products now making an appearance in the market.
Some traditional backup software or data protection software vendors are now integrating deduplication directly within their software, close to where the data is managed. This introduces a number of key benefits over previous generations of deduplication technology.
By employing a single, efficient, scalable platform to perform a complete range of data management functions, deduplication can be leverage not only for traditional backup and recovery but can be extended to snapshot copies, continuous data replication copies and archiving copies.
Another benefit is that deduplication becomes decoupled from a hardware appliance and any propriety disk that may be attached to it. By leveraging deduplication technology at the software layer, the benefits can be extended to more devices such as storage arrays and tape infrastructure with simultaneous support for encryption, performance and scalability. Apart from cost savings through reduction of the amount of physical storage required to maintain data protection environments, optimization is also achieved. Next Generation data deduplication built into data protection software has proved to reduce offsite tape storage by 90% since any copies to tape can also be deduped.
Integrated software that provides an end-to-end data management solution can deliver the following benefits:
- Improved performance through smart deduplication: Rather than repeatedly backing up the same data over the network, only to discard it at the backend, intelligent integrated software enables a smarter data selection policy that allows only changed objects to be transferred to the backend without impacting recovery. In addition, it maintains a global reference to the existing data segments that are transferred to the backend. In-memory processing ensures redundant data never reaches the disk store. This dramatically improves the scalability of the solution and improves backup performance.
- Improved manageability—end-to-end solution workflow: By delivering deduplication within a complete backup, archival, or disaster recovery (DR) workflow, integrated software simplifies data management. In fact, by integrating deduplication with broader data management functional areas, implementation and management is simplified, delivering a better ROI in less time.
- Integrated software that provides immediate alerts helps assure data integrity and security. If any deviation from the original data composition occurs due to accidental alteration of data during transmission or storage, the user is immediately notified.
Benefits of Deduplication
The business benefits from deduplication start with lower data protection costs and lead to efficiency benefits of better recovery times. Data deduplication allows organizations to significantly reduce the amount of disk needed for backup across all storage tiers. With reduced acquisition costs—and reduced power, space, and cooling requirements—your disk becomes suitable for primary backup, restore and retention that can easily encompass months of data. With data on disk, restore service levels are higher, media handling errors are reduced, and more recovery points are available on fast recovery media. The cumulative result is that deduplication-based cost reductions and storage efficiency free up your headcount and resources for more strategic tasks and dramatically speed recovery from any storage tier.
Not surprisingly, organizations experiencing large growth in their primary storage, evolving data protection needs—which include archiving or compliance—will see the biggest benefits of deduplication. The most amount of duplication in a storage environment is in the backup of data, therefore those who perform full backups frequently will see the biggest savings.
Looking into the Future
Like disk-to-disk backup or server virtualization, deduplication should not be evaluated as an isolated product or feature; customers must consider the broader implications of deduplication within the context of their entire data management and storage strategy. The traditional approach by vendors is to offer appliances which can be easily integrated into an existing data centre with minimal disruption. Although this remains a popular option, new integrated options are emerging.
The question is not which method of technology is better than the other, rather which is most suitable for your particular environment, taking into account the volume of data you have, the amount of deduplication that exists, the existing infrastructure you have and the amount of resources at your disposal to manage it.
Deduplication is evolving from an esoteric technology, confined to fixed size packaging into embedding itself into various parts of the storage infrastructure. The important thing is as customers you have multiple choices today to address your particular needs.
For more information, please contact;
Srisuput Siangyen
Core & Peak
Tel: 0 2439 4600 ext. 8300
[email protected]