WANdisco introduces Hive metadata migration to Databricks
May 28 2021
WANdisco introduced a new capability to its LiveData Migrator product that allows customers to move Apache Hive metadata into Databricks.
Customers trying to migrate on-premises Hadoop to Databricks in the cloud can now use LiveData Migrator's new capability to ensure full functionality at the target. LiveData Migrator incrementally converts Hive metadata to the Delta format during the migration, so relationships between the Hadoop data are maintained once it lands on Databricks.
Previously, when migrating Hadoop data to a cloud data warehouse, LiveData Migrator moved only the raw data itself -- not the dependencies between them. In order to ensure that their applications still work once they're in the cloud, customers would have to manually re-establish those relationships through a labor-intensive process of rewriting Hadoop code into the new cloud architecture.
WANdisco now automates that transformation for Databricks, making Hadoop and Hive data immediately available in Delta Lake on Databricks.
"It's not enough to just move the data," said WANdisco CEO David Richards.
Transforming the Hadoop and Hive data as it lands means customers can use their new cloud-based data and applications much faster and without the risk of failure inherent to doing it manually, Richards said. Customers' Hadoop environments tend to be in the petabyte (PB) scale, which makes the cloud migration task even more difficult. Customers recognize migration is essential, as the alternative is to keep buying hardware to support the environment's growth -- something that will eventually become unsustainable, Richards said.
LiveData Migrator can reflect changes in the data source into the data target mid-migration, and it can do the same with Hive metadata migrations. With all ongoing changes getting captured, customers don't have to take down their production environments to perform a migration. Some WANdisco customers handle millions of transactions per second, according to Richards, making any amount of downtime unfeasible.
LiveData Migrator's Hive metadata migration capability currently works only with Databricks, but WANdisco is working on extending it to Snowflake. WANdisco targeted Databricks first because that's where most Hadoop users are migrating, Richards said.
Most migration tools move only the data, making WANdisco's new capability relatively unique. Next Pathway is another migration vendor that can perform PB-scale migration to cloud data warehouses while keeping data dependencies intact.
The journey to the cloud for Hadoop environments is "pretty inevitable," said Merv Adrian, research vice president at Gartner. There comes a time with every environment where customers weigh the cloud cost against the depreciation of their hardware. For large-scale Hadoop environments, the cloud provides a greater value proposition.
Making the move to the cloud is the tricky part, Adrian said. It's a time-consuming, manual, risky and disruptive process. It's also a one-way movement, making it highly unlikely any organization has staff members who are experts on performing the migration. A third-party vendor would have that expertise and is the safest option, making WANdisco well-positioned to address an emerging market, Adrian said.
"There are a lot of people with lots of nodes of Hadoop, and this is de-risking a process people are worried about," Adrian said.
The biggest hurdle to Hadoop migration is that it's a high-transaction environment, so the rate of change for its data is very high, Adrian added. One of WANdisco's greatest benefits is that it can allow the environment to operate normally while the migration is happening.