Support Services

Partners

Jeff Tofano - SEPATON

05/03/12

Warning: Big Data will Break Your Backup Part 2

Permalink 02:48:08 pm, Categories: Notes  

Big Data Demands Big Data Backup
As I mentioned in my last blog Warning: Big Data will Break Your Backup, we have all gotten used to hearing about huge data growth and its impact on data centers. As daunting as this data growth has been, it doesn’t compare to the tsunami being generated in Big Data environments. Big data is measured in petabytes and growing by orders of magnitude annually. Simply adding more traditional data protection technologies – tape and multiple siloed systems is not sufficient. They are just too slow, and too labor-intensive be feasible and cost-effective in these environments.

Big Data Environments Need a New Approach
Just as big data primary storage is efficient, fast, and scalable, big data secondary storage and data protection needs to be efficient, fast, and scalable too. Let’s start with efficient.

With big data protection comes big costs. The days of backing up and storing everything (just in case) are over. Cost reduction, labor savings, and system efficiency are paramount to allow organizations to protect massively growing amounts of data while meeting backup windows and business objectives. Automatic, tiered data protection that includes CDP, snapshots, D2D, tape, remote replication and cloud is a must-have. Data protection technologies designed for big data environments need to be capable of automatically identifying and moving low-priority data to the lowest cost recovery tier - without administrator involvement.

Tiered Storage Becomes Essential
A tiered recovery model enables data managers to balance costs with recovery time, depending on the recovery SLA requirements of the specific dataset. While low priority data may have slower restore times, all restores use the same mechanism, regardless of tier.

Second, even with smart policy driven tiering, big data environments still need fast, scalable ingest performance to meet their data protection needs. Stacking up dozens of single-node, inline deduplication systems will lead to crushing data center sprawl and complexity. To meet data protection needs and windows, big data environments need massive single system capacity coupled with scalable, multi-node deduplication that doesn’t slow data ingest or bog down restore times.

Compatibility with Existing Infrastructure and Reporting
Third, given the scale of big data environments, high risk rip and replace solutions are out of the question. New data protection solutions need to coexist and integrate seamlessly into existing environments without disruption or added complexity. Optimally, IT staff can manage the resulting environment - both new and existing IT resources - in one simple view that consolidates management and reporting for the entire infrastructure.

Database Deduplication Efficiency
Big Data environments often have a large volume of duplicate data in segments that are too small for many of today’s deduplication technologies to detect. New big data-optimized deduplication technologies are required to minimize costs and footprint regardless of where the data is retained.

The bottom line: traditional data protection cannot handle demanding big data environments. A new approach is needed that delivers:
• Fast, deterministic ingest/outgress performance
• Intelligent, automated tiering
• Enterprise-wide management control and reporting
• Scalable deduplication designed specifically for big data database environments

In a coming blog, I will cover the Big Data topic in more depth with a discussion of what is needed in a big backup appliance.

03/29/12

Warning: Big Data will Break Your Backup

Permalink 11:10:41 am, Categories: Notes  

The volume of data that every enterprise is dealing with continues to grow at an alarming rate. Backup has been on the forefront of this wave. In fact, the amount of data being protected is growing so fast and the solutions are so broken it’s like pouring gasoline on a fire. But this is just the beginning – as technologies like cloud and big data continue to take hold, the problem is getting significantly worse – and most of the traditional storage approaches will fail under the weight of this growth. The “big data” wave will ripple through enterprise environments starting in primary storage and building to a tsunami of epic proportions by the time it reaches the data protection environment. It will create fire-drill after fire-drill.

Backup is the first place customers will feel the pain – any non-scalable data protection technology will leave customers with a hard choice – keep buying difficult to manage data protection silos or switch to truly scalable data protection architectures that have a chance of solving the problem. “Big Data Backup” is about to hit us all and break the infrastructure we’ve been building for the last decade.

But the trend won’t stop there. Current methodologies for archive and staging data for traditional analytics platforms will also fail as data sets grow to sizes that turn simple data transfers into major pain points. The traditional scale out NAS approach just won’t keep up. Newer massively scalable object repositories will become the only viable solutions going forward. Any second tier solutions not built around these contemporary technologies will eventually fall out of competitive positions.

As we already are seeing, the pain around building scalable primary cloud and analytic repositories is already being felt. Most enterprise customers simply can’t approach the problem like Google, Yahoo and other engineering dominant companies. Massively scalable and manageable analytic platforms solutions are needed -- and they must interoperate efficiently with similarly designed scalable data protection platforms going forward.

Interoperability is not just a nice-to-have, it is essential. Really big data needs to move as little as possible, and when it does move, efficiency will become critical. Most of the traditional analytics storage technologies are buckling under the current load. With no slow-down in sight we’re sure to see these traditional technology approaches fail completely in the near future.

So what is a customer to do? First, be aware of the size and scope of the problem. It’s overwhelming and it’s going to get much, much worse. Second, be cautious about committing to tactical fixes and traditional siloed approaches. Even if they let you “get by” for a while, they will eventually fail on all fronts – causing massive added cost, weak manageability, and data movement that will crush you.

Start to evaluate emerging “Big Data Backup”, “Big Archive” and “Big Data” solutions and build plans to integrate them into their storage ecosystems soon. And finally, customers need to remember emerging solutions need to not only scale and perform cost effectively, but they also need to be manageable in all ways. IT staff are being asked to manage more and more data per person. In fact the ratio of IT staff to volume of data managed is getting ridiculous, so issues like ease of deployment, upgrade and normal operation are critical. Remember, these solutions must also automate data management and integrity tasks, periodically scrubbing data. The bigger data stores get (and they will get huge in coming years), the more data integrity on these systems becomes an issue to worry about.

In a coming blog, I will discuss the demands that big data is making on backup environments in more detail.

02/15/12

10 Questions You Must Ask Your Data Protection Vendor About Big Backup

Permalink 11:09:23 am, Categories: Notes  

Large enterprises must look closely at the strengths and weaknesses of the available deduplication technologies on the market before choosing a solution that best meets the needs of their own Big Backup environments.

The volume of data generated by most companies is now growing at such an explosive rate that many data centers have simply run out of the space, power, cooling and storage capacity required to handle it. In large enterprise organizations with ‘Big Backup’ environments, the sheer volume and variety of data to be protected requires a level of performance and scalability that few data protection technologies can deliver.

Keeping up with exponential data growth within budget, space, power and cooling constraints is a constant data center challenge. And it is being compounded by increasingly stringent regulatory requirements and business initiatives demanding higher service levels, longer online retention times and higher levels of data protection. There are several deduplication technologies available to meet the needs of small-to-medium-sized organizations. However, for large enterprise organizations with Big Backup environments, most deduplication technologies fall short, resulting in costly, inefficient capacity reduction, and time-consuming administration. Understanding the distinctions between these technologies is essential for choosing those most appropriate for Big Backup.

SEPATON recommends that all large enterprises ask their backup technology vendor the following ten questions. This will help them gain a clear understanding of the strengths and drawbacks of each option so that they can choose one that best meets their Big Backup requirements:

1. What impact will deduplication have on backup performance, both now and over time?
2. Will deduplication degrade restore performance?
3. How will capacity and performance scale as the environment grows?
4. How efficient is the deduplication technology for large databases (e.g. Oracle, SAP, SQL Server)?
5. How efficient is the deduplication technology in progressive incremental backup environments such as Tivoli Storage Manager (TSM) and in NetBackup OST?
6. What are realistic expectations for capacity reduction given the high data change rate common in Big Backup environments?
7. Can administrators monitor backup, deduplication, replication and restore processes enterprise-wide?
8. Can deduplication help reduce replication bandwidth requirements for large enterprise data volumes without slowing backup performance?
9. Can IT teams ‘tune’ deduplication by data type to meet their specific needs?
10. How much experience does the vendor have with large enterprise backup applications such as Symantec NetBackup/OST and TSM?

For more details, a white paper titled ‘Choosing an Enterprise-Class Deduplication Technology’