Apache Kafka
Apache Kafka[1]
Apache kafka.png
Developer(s) Apache Software Foundation
Initial release January 2011; 6 years ago (2011-01)[2]
Stable release
1.0 / November 1, 2017; 42 days ago (2017-11-01)
Repository git-wip-us.apache.org/repos/asf/kafka.git
Development status Active
Written in Scala, Java
Operating system Cross-platform
Type Stream processing, Message broker
License Apache License 2.0
Website kafka.apache.org

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log,"[3] making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

The design is heavily influenced by transaction logs.[4]

History

Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In November 2014, several engineers who worked on Kafka at LinkedIn created a new company named Confluent[5] with a focus on Kafka. According to a Quora post from 2014, Jay Kreps seems to have named it after the author Franz Kafka. Kreps chose to name the system after an author because it is "a system optimized for writing", and he liked Kafka's work.[6]

Apache Kafka Architecture

Overview of Kafka

Kafka stores messages which come from arbitrarily many processes called "producers". The data can thereby be partitioned in different "partitions" within different "topics". Within a partition the messages are indexed and stored together with a timestamp. Other processes called "consumers" can query messages from partitions. Kafka runs on a cluster of one or more servers and the partitions can be distributed across cluster nodes.

Apache Kafka efficiently processes the real-time and streaming data when implemented along with Apache Storm, Apache HBase and Apache Spark. Deployed as a cluster on multiple servers, Kafka handles its entire publish and subscribe messaging system with the help of four APIs, namely, producer API, consumer API, streams API and connector API. Its ability to deliver massive streams of message in a fault-tolerant fashion has made it replace some of the conventional messaging systems like JMS, AMQP, etc.

The major terms of Kafka's architecture are topics, records, and brokers. Topics consist of stream of records holding different information. On the other hand, Brokers are responsible for replicating the messages. There are four major APIs in Kafka:

  • Producer API - Permits the applications to publish streams of records.
  • Consumer API - Permits the application to subscribe to the topics and processes the stream of records.
  • Streams API - This API converts the input streams to output and produces the result.
  • Connector API - Executes the reusable producer and consumer APIs that can link the topics to the existing applications.

Kafka performance

Due to its widespread integration into enterprise-level infrastructures, monitoring Kafka performance at scale has become an increasingly important issue. Monitoring end-to-end performance requires tracking metrics from brokers, consumer, and producers, in addition to monitoring ZooKeeper which is used by Kafka for coordination among consumers.[7][8] There are currently several monitoring platforms to track Kafka performance, either open-source, like LinkedIn's Burrow, or paid, like Datadog. In addition to these platforms, collecting Kafka data can also be performed using tools commonly bundled with Java, including JConsole.[9]

Enterprises that use Kafka

The following is a list of notable enterprises that have used or are using Kafka:

See also

References

  1. ^ "Mirror of Apache Kafka at GitHub]". github.com. Retrieved 2017. 
  2. ^ "Open-sourcing Kafka, LinkedIn's distributed message queue". Retrieved 2016. 
  3. ^ Monitoring Kafka performance metrics, Datadog Engineering Blog, accessed 23 May 2016/
  4. ^ The Log: What every software engineer should know about real-time data's unifying abstraction, LinkedIn Engineering Blog, accessed 5 May 2014
  5. ^ Primack, Dan. "LinkedIn engineers spin out to launch 'Kafka' startup Confluent". fortune.com. Retrieved 2015. 
  6. ^ "What is the relation between Kafka, the writer, and Apache Kafka, the distributed messaging system?". Quora. Retrieved . 
  7. ^ "Monitoring Kafka performance metrics". 2016-04-06. Retrieved . 
  8. ^ Mouzakitis, Evan (2016-04-06). "Monitoring Kafka performance metrics". datadoghq.com. Retrieved . 
  9. ^ "Collecting Kafka performance metrics - Datadog". 2016-04-06. Retrieved . 
  10. ^ "Exchange Market Data Streaming with Kafka". betsandbits.com. Archived from the original on 2016-05-28. 
  11. ^ "OpenSOC: An Open Commitment to Security". Cisco blog. Retrieved . 
  12. ^ "More data, more data". 
  13. ^ "Conviva home page". Conviva. 2017-02-28. Retrieved . 
  14. ^ Doyung Yoon. "S2Graph : A Large-Scale Graph Database with HBase". 
  15. ^ "Kafka Usage in Ebay Communications Delivery Pipeline". 
  16. ^ "Cryptography and Protocols in Hyperledger Fabric" (PDF). January 2017. Retrieved . 
  17. ^ "Kafka at HubSpot: Critical Consumer Metrics". 
  18. ^ Cheolsoo Park and Ashwin Shankar. "Netflix: Integrating Spark at Petabyte Scale". 
  19. ^ Boerge Svingen. "Publishing with Apache Kafka at The New York Times". Retrieved . 
  20. ^ Shibi Sudhakaran of PayPal. "PayPal: Creating a Central Data Backbone: Couchbase Server to Kafka to Hadoop and Back (talk at Couchbase Connect 2015)". Couchbase. Retrieved . 
  21. ^ "Shopify - Sarama is a Go library for Apache Kafka". 
  22. ^ Josh Baer. "How Apache Drives Spotify's Music Recommendations". 
  23. ^ Patrick Hechinger. "CTOs to Know: Meet Ticketmaster's Jody Mulkey". 
  24. ^ "Stream Processing in Uber". InfoQ. Retrieved . 
  25. ^ "Apache Kafka for Item Setup". medium.com. Retrieved . 
  26. ^ "Streaming Messages from Kafka into Redshift in near Real-Time". Yelp. Retrieved . 

External links


  This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.


Apache_Kafka
 



 

Connect with defaultLogic
What We've Done
Led Digital Marketing Efforts of Top 500 e-Retailers.
Worked with Top Brands at Leading Agencies.
Successfully Managed Over $50 million in Digital Ad Spend.
Developed Strategies and Processes that Enabled Brands to Grow During an Economic Downturn.
Taught Advanced Internet Marketing Strategies at the graduate level.


Manage research, learning and skills at defaultLogic. Create an account using LinkedIn or facebook to manage and organize your IT knowledge. defaultLogic works like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us