What is Apache Hive?
Apache Hive is a data warehouse system built on top of Apache Hadoop that allows easy data querying, analysis and reporting of massive datasets distributed across various systems, file stores and databases, built with Hadoop.
It is designed to offer an abstraction that supports applications that want to use data residing in a Hadoop cluster in a structured manner, allowing ad-hoc querying, summarization and other data analysis tasks to be performed using high-level constructs, including Apache Hive SQL queries.
What is WANdisco LiveData Hive Plugin?
Consistent Hive metadata
The WANdisco LiveData Hive Plugin extends the capabilities of WANdisco LiveData Platform to allow your Hive infrastructure to participate fully in a LiveData environment. Give your Hadoop clusters a shared Hive metastore without the cost of single points of failure, degraded performance or administrative headaches. Replicate Hive metadata as it changes in any cluster, with strong consistency among all environments, and selective replication based on matching databases, tables and file system locations.
Always consistent queries
Share the same Hive definitions across multiple environments, regardless of where and when changes are made. Dramatically simplify the configuration of metadata replication with a LiveData strategy, so that all applications have access to the same Hive tables wherever they are required.
Guaranteed data consistency
Query your Hive data from any cluster with the same results everytime, everywhere. Ingest data, alter tables, create new Hive representations and maintain consistent results at all times.
Never worry about periods of time where Hive representations may differ among clusters because of periodic replication. Replicate your changes as they occur, without conflict among environments.
Recover from network or system outages automatically without the risk of introducing metadata inconsistencies. Accommodate your planned and unplanned outages with ease, and reduce administration costs.
Simple administration and integration
Extend an existing WANdisco LiveData Platform deployment with the WANdisco LiveData Hive Plugin without downtime or disruption. Take advantage of LiveData replication for Hive metadata without changing Hive applications or each cluster's Hive metastore. Use simple replication rules to define which Hive databases, tables and file system locations are replicated with strong consistency.
Supports all major Hadoop distributions. Refer to the product documentation for more details.
Apache Hive replication across cloud environments
WANdisco LiveData for Databricks provides an automated risk-free answer to the challenges of replicating enterprise big data systems to the cloud. Together with Delta Lake running on Databricks we provide a solution for organizations to take advantage of a unified analytics platform in the cloud without disrupting business operations.Learn More