Fulfilling its real-time Big Data analytics potential: Azure HDInsight Service now supports live multi-location data synchronization

By WANdisco , Sep 26, 2017

Finally, a breakthrough that promises to transform the way companies use advanced Big Data analytics: Microsoft has announced that Azure HDInsight Service users will now be able to replicate data accurately and in real time across two or more locations via a single-click installation of WANdisco Fusion® to an Azure HD Insight Cluster

This means that it will now be possible to synchronize live data sets between two or more locations in real time – i.e. between day-to-day business systems that are being updated continuously, and the externally-hosted systems and services that are simultaneously crunching that data for other more elaborate and strategically important purposes. In the case of the Microsoft Azure HDInsight Service – Microsoft’s cloud-based solution for Big Data analytics – that secondary use could be social media tracking, IoT/health monitoring, or fraud analytics, for instance. That is, data-intensive applications that are processing and responding to live information feeds on the fly.

The risk with trying to work with live data distributed across more than one geographic location is that unless it is being continuously replicated, there will always be disparity between the different end points. It’s a bit like complex documents requiring input from multiple parties. Without systematic version management or controlled document sharing, there’s always a risk that someone may be working with an older copy of the content – causing chaos.

Shoring up reserves

Without an authoritative, agreed single version of the data ‘truth’, there will be implications not only for the currency of analytics output and the actions this triggers, but also for other scenarios which depend on absolute data synchronicity. 

An obvious one is disaster recovery/business continuity. This is a common first use case for the Cloud: the economics of using a pre-existing, pre-vetted third party to host a copy of important data is very appealing to businesses, compared with setting up their own secondary data center. 

But something they may not be aware of is that, where live systems and real-time data are involved, business continuity can only be assured if those secondary data sets are as complete and up-to-date as the data residing in core, internal systems. If the data involved is on a substantial scale (so that backups rely on data being copied across overnight, or via physical transit between locations using hard disks), the lag between updates poses a practical problem.  

If a major transactional system goes down and the backup copy held somewhere else is up to a day out of sync, that could be a whole day’s bookings, sales or analysis lost. If live systems and remote backups are out of sync by anything more than a few minutes, the time taken to restore live activity – and the disruption incurred in the meantime – could be significant. And of course data consumption is growing by the day. IDC predicts that by 2025, the annual data generation will reach 16.1 zetabytes (trillion gigabytes) – 10 times that produced in 2016. So this is a situation that is only going to intensify. 

Active, ongoing data replication protects organizations against downtime, because it ensures that there is always a true, current copy of the live data in a second location that can be swapped in to play at a moment’s notice.

Defaulting to the cloud

Hybrid infrastructure scenarios – where organizations run some systems internally, but use the cloud for particular applications or processes - also depend on synchronicity. If on-premise systems and remotely-hosted applications share data, it had better be identical.  Gartner predicts that by 2020, 90 percent of organizations will adopt hybrid infrastructure management capabilities. So, again, the importance of solving the continuous data synchronization issue will only grow more critical over time.

Active replication also paves the way for companies to ‘burst’ into the cloud - tapping into flexible, affordable additional data storage capacity and processing power on demand to service peak demand, special compute-intensive projects, or pop-up offices. As our reliance on Big Data continues to grow, we can bet that organizations will soon be doing this increasingly routinely. Analyst firm 451 notes that using an on-site private cloud environment combined with burst capacity to public clouds is often more economical and less disruptive than putting everything in the public cloud.

These sought-after scenarios just wouldn’t be viable without the assurance of completely consistent data between the dispersed IT locations - or not without a great deal of complexity and additional cost. So the Microsoft announcement is an important milestone for Azure HDInsight Service. 

It means organizations can do even more with their data – reliably, in the cloud. They’re covered by important controls too – for instance, over which subsets of content go where, satisfying data sovereignty, data protection and data availability requirements. 

Most importantly, volumes of data are no limit to what users can do with it: because data is continuously being synchronized, companies can avoid the hassle and disruption of shipping physical data containers between locations to mine it and discover new insights - an impractical workaround the data center industry has had to come up with as the world’s hunger for data and its insights soars. 



Get notified of the latest WANdisco Blog posts and Newsletter.

Mailing list form embedded here once it exists.

Our LiveData Story

Related Blog Posts

Tech & Trends

LiveData Migrator makes migrating to AWS Glue Data Catalog easy. In two steps, teams can migrate from an Apache Hive metastore to AWS Glue Data Catalog.

Teams can easily migrate from an Apache Hive metastore to AWS Glue Data Catalog. LiveData Migrator e...

May 13, 2021

Read More

Tech & Trends

COVID-19 Accelerated Cloud Adoption. Are You Ready for What’s Next?

LiveData Migrator addresses the challenges associated with large-scale cloud data migration enabling...

Apr 29, 2021

Read More

Tech & Trends

Learn about Azure cloud storage solutions at Azure Storage Day

Microsoft is hosting Azure Storage Day on April 29, 2021 where you can learn more about Azure cloud...

Apr 19, 2021

Read More

Seeing is Believing. Try WANdisco Now.

Fully-featured, self-service and automated.

Start migrating Hadoop data in minutes, at any scale, to any cloud

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.