Saturday, December 30, 2017

Oracle Linux - Install Apache Kafka

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.



In this blogpost we will install Apache Kafka on Oracle Linux, the installation will be done in a test setup which is not ready for production environments however can very well be used to explore Apache Kafka running on Oracle Linux.

Apache Kafka is also provided as a service from the Oracle Cloud in the form of the Oracle Cloud Event Hub. This provides you a running Kafka installation that can be used directly from the start. The below video shows the highlights of this service in the Oracle Cloud.


In this example we will not use the Event Hub service from the Oracle Cloud, we will install Kafka from the ground up. This can be done on a local Oracle Linux installation or it can be done on a Oracle Linux installation in the Oracle Cloud, making use of the Oracle IaaS components in the Oracle Cloud.

Prepare the system for installation.
In esscence, the most important step you need to undertake is to ensure you have Java installed on your machine. The below steps outline how this should be done on Oracle Linux.

You can install the Java OpenJDK using YUM and the standard Oracle Linux repositories.

yum install java-1.8.0-openjdk.x86_64

You should now be able to verify that Java is installed in the manner shown below as an example.

[root@localhost /]# java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
[root@localhost /]#

This however is not making sure you have set the JAVA_HOME and JRE_HOME as environment variables. To make sure you will have the following two lines in /etc/profile.

export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export JRE_HOME=/usr/lib/jvm/jre

After you made the changes to this file you reload the profile by issuing a source /etc/profile command. This will ensure that the JRE_HOME and JAVA_HOME environment variables are loaded in the correct manner.

[root@localhost ~]# source /etc/profile
[root@localhost ~]#
[root@localhost ~]# env | grep jre
JRE_HOME=/usr/lib/jvm/jre
JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
[root@localhost ~]#

Downloading Kafka for installation
Before following the below instructions, it is good practice to check what the latest version is and download the latest stable Apache Kafka version. In our case we download the file kafka_2.11-1.0.0.tgz for the version we want to install in our example installation.

[root@localhost /]# cd /tmp
[root@localhost tmp]# wget http://www-us.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz
--2017-12-27 13:35:51--  http://www-us.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz
Resolving www-us.apache.org (www-us.apache.org)... 140.211.11.105
Connecting to www-us.apache.org (www-us.apache.org)|140.211.11.105|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49475271 (47M) [application/x-gzip]
Saving to: ‘kafka_2.11-1.0.0.tgz’

100%[======================================>] 49,475,271  2.89MB/s   in 16s 

2017-12-27 13:36:09 (3.01 MB/s) - ‘kafka_2.11-1.0.0.tgz’ saved [49475271/49475271]

[root@localhost tmp]# ls -la *.tgz
-rw-r--r--. 1 root root 49475271 Nov  1 05:39 kafka_2.11-1.0.0.tgz
[root@localhost tmp]#

You can untar the downloaded file with a tar -xvf kafka_2.11-1.0.0.tgz and than move it to the location where you want to place Apache Kafka. In our case we want to place kafka in /opt/kafka so we undertake the below actions:

[root@localhost tmp]# mkdir /opt/kafka
[root@localhost tmp]#
[root@localhost tmp]# cd /tmp/kafka_2.11-1.0.0
[root@localhost kafka_2.11-1.0.0]# cp -r * /opt/kafka
[root@localhost kafka_2.11-1.0.0]# ls -la /opt/kafka/
total 48
drwxr-xr-x. 6 root root    83 Dec 27 13:39 .
drwxr-xr-x. 4 root root    50 Dec 27 13:39 ..
drwxr-xr-x. 3 root root  4096 Dec 27 13:39 bin
drwxr-xr-x. 2 root root  4096 Dec 27 13:39 config
drwxr-xr-x. 2 root root  4096 Dec 27 13:39 libs
-rw-r--r--. 1 root root 28824 Dec 27 13:39 LICENSE
-rw-r--r--. 1 root root   336 Dec 27 13:39 NOTICE
drwxr-xr-x. 2 root root    43 Dec 27 13:39 site-docs
[root@localhost kafka_2.11-1.0.0]#

Start Apache Kafka
The above steps should have placed Apache Kafka on your Oracle Linux system, now we will have to start it and test it for its working. Before we can start Kafka on our Oracle Linux system we first have to ensure we have ZooKeeper up and running. To do so, execute the below command in the /opt/kafka directory.

bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

Depending on the sizing your machine you might want to change some things to the startup script for Apache Kafka. When, as in my case, you deploy Apache Kafka in an Oracle Linux test machine you might not have as much memory allocated to the test machine as you might have on a "real" server. The below line is present in the bin/kafka-server-start.sh file which sets the memory heap size that should be used.

    export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"

in our case we changed the heap size to 128 MB which is more than adequate for testing purposes however migth be way to less when trying to deploy a production system. The below is an example of the setting as used for this test:

    export KAFKA_HEAP_OPTS="-Xmx1G -Xms128M"


This should enable you to start Apache Kafka for the first time as a test on Oracle Linux. You can start Apache Kafka from the /opt/kafka directory using the below command:

bin/kafka-server-start.sh config/server.properties

You should be able to see a trail of messages from the startup routine and, if all is gone right the last message should be the one shown below. This shuuld be an indication that Kafka is up and running.

INFO [KafkaServer id=0] started (kafka.server.KafkaServer)

Testing Kafka
As we now have Apache Kafka up and running we could (should) test if Apache Kafka is working as expected. Kafka comes with a number of scripts that will make testing more easy. The below scrips come in use when starting to test (or debug) Apache Kafka;

bin/kafka-topics.sh
For taking actions on topics, for example creating a new topic

bin/kafka-console-producer.sh
Used for the role as producer of event messages

bin/kafka-console-consumer.sh
for the role as consumer for receiving event messages.

The first step in testing is to ensure we have a topic in Apache kafka to publish event messsage towards. For the we can use the kafka-topics.sh script. We will create the topic "test" as shown below;

[vagrant@localhost kafka]$
[vagrant@localhost kafka]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Created topic "test".
[vagrant@localhost kafka]$

To ensure the topic is really available in Apache Kafka we list the available topics with the below command:

[vagrant@localhost kafka]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
[vagrant@localhost kafka]$

having a topic in Apache Kafka should enable you to start producing messages as a producer. The below example showcases starting the kafka-console-producer.sh script. This will give you an interactive commandline where you can type messages.

[vagrant@localhost kafka]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
hello 1
hello 2
thisisatest
this is a test
{test 123}

[vagrant@localhost kafka]$

As the whole idea is to produce messages on one side and recieve them at the other side we will also have to test the subscriber side of Apache Kafka to see if we can receive the messages on the topic "test" as a subscriber. The below command subscribes to the topic "test" and the --from-beginning options indicates we want to recieve not only event messages that are created as from this moment, we want to receive all messages from the beginning (creation) of the topic (as far as they are available in Apache Kafka).

[vagrant@localhost kafka]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
hello 1
hello 2
thisisatest
this is a test
{test 123}

As you can see the event messages we created as the producer are received by the consumer (subscriber) on the test topic.

No comments: