Performance Evaluation of Apache Kafka – A Modern Platform for Real Time Data Streaming

Shubham Vyas,Rajesh Kumar Tyagi,Charu Jain,Shashank Sahu
DOI: https://doi.org/10.1109/iciptm54933.2022.9754154
2022-02-23
Abstract:Current generation businesses become more demanding on timely availability of data. Many real-time data streaming tools and technologies are capable to meet business expectations. Apache Kafka is one of the capable open-source distributed scalable technology that enables real-time data streaming with good throughput and latency. In traditional batch processing, data is getting processed in groups or batches but in streaming services, data records are handled separately and there is a flow of data processing that is continuous and real-time. Once Data is available at the source, Kafka can detect and stream it in real-time to the target application. After doing the literature survey it was observed that there are insufficient experiments have been done till now with a variety of volumes and with different values of the number of partitions and polling intervals. The purpose of this study is to elaborate on Apache Kafka implementation and evaluate its performance. This study will analyse key performance indicators for the streaming platform and will provide useful insights from it. These insights will help to design optimized applications in Apache Kafka. Based on gaps identified after the literature survey, multiple experiments have been conducted for the producer and consumer API (Application Programming interface). Configuration of Kafka with Apache Zookeeper helped to drive the results which are captured in tabular form for different values of polling intervals, volumes, and partitions. Data for all test runs have been analysed further to drive the conclusions as mentioned in the results section. This study provides valuable insights about the utilization of CPU (Central Processing Unit) and memory for Apache Kafka streaming on changing volumes, also elaborates the impacts on streaming performance when key configurations are getting changed.
What problem does this paper attempt to address?