|Original author(s)||Eric Tschetter, Fangjin Yang|
|Developer(s)||The Druid community|
0.12.3 / 18 September 2018
|Type||distributed, real-time, column-oriented data store|
|License||Apache License 2.0|
Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect the fact that the architecture of the system can shift to solve different types of data problems.
Druid is commonly used in business intelligence/OLAP applications to analyze high volumes of real-time and historical data. Druid is used in production by technology companies such as Alibaba,Airbnb,Cisco,eBay,Netflix,PayPal,Yahoo. and Wikimedia Foundation 
Druid was started in 2011 to power the analytics product of a company named Metamarkets. The project was open-sourced under the GPL license in October 2012, and moved to an Apache License in February 2015.
In October 2018, Spicule Ltd, released a supported version of Druid on the Juju platform from Canonical.
Fully deployed, Druid runs as a cluster of specialized processes (called nodes in Druid) to support a fault-tolerant architecture where data is stored redundantly, and there is no single point of failure. The cluster includes external dependencies for coordination (Apache ZooKeeper), metadata storage (e.g. MySQL, PostgreSQL, or Derby), and a deep storage facility (e.g. HDFS, or Amazon S3) for permanent data backup.
Client queries first hit broker nodes, which forward them to the appropriate data nodes (either historical or real-time). Since Druid segments may be partitioned, an incoming query can require data from multiple segments and partitions (or shards) stored on different nodes in the cluster. Brokers are able to learn which nodes have the required data, and also merge partial results before returning the aggregated result.
Operations relating to data management in historical nodes are overseen by coordinator nodes. Apache ZooKeeper is used to register all nodes, manage certain aspects of internode communications, and provide for leader elections.
Manage research, learning and skills at defaultlogic.com. Create an account using LinkedIn to manage and organize your omni-channel knowledge. defaultlogic.com is like a shopping cart for information -- helping you to save, discuss and share.