Data Migrator
Move your data and metadata to the cloud or between clusters, with no downtime and no service disruption

“WANdisco's uniqueness lies in how it packages Hadoop data migration as a fully hands-off service. Moving data under active change is delicate, and organizations don't want to use their best IT people on it. WANdisco's Data Migrator handles everything in the background and doesn't require expertise from the customer. It's as close to a silver bullet as you can find for this type of project”Merv Adrian, Gartner Research Vice President of Data and Analytics
![]()
What is Data Migrator?
Migrate from Hadoop to cloud without disruption or downtime.
Data Migrator is a fully automated cloud migration solution that migrates HDFS data and Hive metadata to the cloud, even while those data sets are under active change. Data Migrator is fully self-service requiring no WANdisco expertise. It requires zero changes to applications or business operation. Migrations of any scale can begin immediately, and be performed while the source data is under active change without requiring any production system downtime or business disruption, and with zero risk of data loss.
Immediate
Enabling administrators to easily deploy the solution and begin migration of data lake content to the cloud immediately. It is entirely non-intrusive and requires zero changes to applications, cluster or node configuration or operation.
Live
Leveraging WANdisco's live data capabilities, data migration can occur while the source data is under active change, without requiring any production system downtime or business disruption, supporting complete and continuous data migration.
Scalable
Data Migrator is able to accommodate data migration at any scale, from terabytes to exabytes, and without any risk of data loss.
Benefits of Data Migrator
Data Migrator enables you to transition to a live data environment which makes your data globally available, accurate and protected, avoiding the costs of a manual migration and the pattern of data silos that emerge when data cannot be kept consistent.

Business Continuity
- No need for downtime of on-premises production clusters
- Immediate availability of migrated data
- High scalability and performance for migration at any scale

Complete and Continuous Migration
- Data migration with single pass of source storage
- Ongoing migration of any subsequent data changes
- Ensures zero data loss of source data and changes

Cost Avoidance
- Minimizes the need for IT resource involvement
- Automated migration without custom code maintenance
- Faster time-to-value and adoption of AI and ML
Data Migrator Automates Cloud Migration
Zero Business Disruption, Zero Risk, and Best Time-to-Value
Quick deployment and operation
Data Migrator is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impacting current operations, so users can begin migrations immediately.
Complete and continuous migration
Migrates existing datasets with a single pass through the source storage system, eliminating the overhead of repeated scans, while also supporting continuous migration of any ongoing changes from source to target with zero disruption to current production systems.
Multiple source and target systems support
Supports HDFS distributions v2.6 and higher as source systems, and all leading cloud service providers and other select ISVs, such as Databricks and Snowflake, as the target systems. See the Data Migrator documentation and release notes for details.
Browser-based user interface
Users can leverage the WANdisco user interface (UI), a browser-based UI that allows them to manage the full data migration (data and metadata) from the single management console.
Configurability and control
Ability to configure the migrations to meet the organizations specific needs, including standard configuration such as defining sources, targets, and data to be migrated, as well as advanced capabilities such as migration prioritization, path mapping, and network bandwidth management controls.
Migration at any scale
Migrates big datasets at any scale, from terabytes to multi-petabytes, without impacting current production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.
Hadoop data and Hive metadata migration
Supports migration of HDFS data and Hive metadata to any public cloud, as well as to other on-premises environments.
Migration verification
Migration verification scans both source and target environments to ensure data fidelity and validate the success of all data migrations. Notifications can be used to specify the status of migration verifications and receive the results by email.
Programmatic interface
Migrations can also be managed through a comprehensive and intuitive command-line interface or using the self-documenting REST API to integrate the solution with other programs as needed.
Metrics and monitoring
Information to keep you updated on the migration jobs, from health and status metrics providing estimates for migration completion to email notifications and real-time insights regarding usage and promoting hands-off operations.
The Data Migrator Approach
Only Data Migrator is able to move data lake content to the cloud immediately, at scale, with no application downtime and no risk of data loss, even when data sets are under active change.
Other approaches to large-scale Hadoop data migration rely on repeated iterations where source data is copied, but they do not take ongoing changes into account during that time. They require significant up-front planning, and impose operation downtime if there is a need for ensuring data are migrated completely.
