Multi-cloud futures are driving evolved data strategies for cloud analytics
Posted in Tech & Trends on Dec 14, 2018
Today the immediate availability of analytics has become business-critical. Jagane Sundar comments on the pressures that are prompting new data strategies for cloud analytics, and how enterprises are preparing for multi-cloud environments.
DISCOtecher: Perhaps to begin with it might be helpful to understand exactly what we mean by analytics infrastructure. What is your preferred definition?
Jagane: Good point. I think Robert L Grossman from University of Illinois at Chicago and Open Data Group provides a clear definition: “Analytic infrastructures [are] the services, applications, utilities and systems that are used for either preparing data for modeling, estimating models, validating models, scoring data, or related activities. For example, analytic infrastructure includes databases and data warehouses, statistical and data mining systems, scoring engines, grids and clouds”.
DISCOtecher: You have commented in the past that analytics infrastructure has been evolving, and that you see a worrying trend. What is this?
Jagane: Analytics are business-critical, and I think that’s a given for modern enterprises. Insurers, airlines, retailers, auto-makers, and all kinds of enterprises need their analytics engines to be always up, always available, regardless of what happens to the data center. There are many pressures on keeping analytics up 24x7x365, including the challenges of managing increasingly distributed data sources and analytics teams.
This almost total shift from transaction capability to analytics capability has been an appreciable change happening over an extended period.
While core transaction systems have attracted significant infrastructure investment, particularly for disaster recovery and data protection, analytics infrastructure has been somewhat overlooked.
DISCOtecher: With enterprises managing increasingly distributed data stores and no prevailing strategy to protect analytics infrastructure, quite a lot is at stake.
Jagane: Correct. The biggest single challenge for analytics teams and their systems is a potential lack of data availability. Whether the data is in the cloud or on-premises, traditional BI or sophisticated ML models, or experimentation environments – all this data needs proper backup capabilities.
There has been enormous progress on application resilience, but not so much on data. For example, if a data center goes down, cloud providers have many recovery options to bring up recovery application instances at very short notice. But what about the data itself? Even if you can recover applications at secondary locations, do you have a complete, current copy of the data to continue operations? The major challenge is therefore ensuring that data is consistent between data centers, whether on-premises, hybrid or multi-cloud. And where analytics systems depend on that data, and the business depends on real-time analytics, working from inconsistent or old data could cause revenue loss or other impact to the enterprise.
Even more worrying, cloud object stores are now the paradigm, and there is a common assumption that cloud provides sufficient data protection.
Enterprises that keep all of their data analytics and experimentation in the cloud are relying on cloud-embedded replication tools. The truth is, for modern analytics, the resilience offered by batch-based or time-based copies of data will not be good enough.
DISCOtecher: Why are cloud-embedded replication tools not sufficient ?
Jagane: Traditional replication solutions offer ‘eventual data consistency.’ In other words, data is replicated in one direction, from point A to point B, within a certain number of hours. But this kind of replication depends on a host of factors, such as busy network links and processing capacity, all of which are shared resources over which you have no control.
Cloud replication solutions take snapshots of the data, and even though the snapshots might be frequent, they are immediately out of date from the very next moment. If an object store fails and the application is restarted using the most recent snapshot, it introduces problems of data consistency, which must be cured before analytics can recover.
DISCOtecher: So what are the new strategies for the evolving cloud landscape?
Jagane: For both hybrid analytics and all-cloud analytics, enterprises need a new data strategy, one that can provide consistent data whether on-premises or in the cloud, even when data is changing and at very large, even petabyte, scale.
What we term a ‘LiveData’ strategy means you can run your analytics applications on-premises or in the cloud, and obtain the same results, because the data is consistent everywhere.
In practical terms, if an object store such as Amazon S3 fails for whatever reason, with a LiveData environment there is no roll-back problem; the application can use any available data source, because the data is consistent. A LiveData strategy eliminates the weakness of time-bound cloud backup and introduces near-zero RPO and near-zero RTO if there is an on-premises data outage. The key deliverable is that at any moment, data is consistent at all your endpoints, and this is the revolutionary effect of LiveData.
DISCOtecher: Could you give me an example of a customer that is protecting their analytics data?
Jagane: Yes, I’m familiar with a global banking institution that had no comprehensive disaster recovery solution for its analytics platforms that were providing risk and compliance capabilities. Historically, system outages had resulted in significant reputational damage and regulatory fines, with large teams deployed to mitigate the risk. By creating a LiveData environment powered by WANdisco Fusion, they have reduced operational risk by implementing disaster recovery across all risk and compliance analytics platforms and fully protected themselves against data loss. In turn this reduces the volume and cost of regulatory fines sustained globally, and cuts the FTE headcount required to manage, monitor and provide resiliency for analytics infrastructure.
DISCOtecher: Great interview, Jagane, thanks very much for your time and insights!
From on-premises to cloud, or across multiple cloud locations, even with multiple cloud providers, WANdisco Fusion ensures consistent data even as it changes. WANdisco Fusion enables an enterprise LiveData capability, and this is the new data strategy that is essential for mission-critical analytics.