This is a multipart series on Developing for Apache Kafka and Amazon MSK (Amazon Managed Streaming for Apache Kafka). Checkout these other articles on Apache Kafka.
The title sounds daunting, but we will guide you step-by-step through the process and at the end of our journey you’ll learn that building a distributed Apache Kafka development cluster isn’t so difficult.
Note, for this solution, Kafka was installed on an Amazon Linux 2 virtual machine. Feel free to use whichever OS you are comfortable with.
Before you start, it will help if you are familiar with Apache Kafka zookeepers and Apache Kafka brokers and their relationship. Also, we will not address securing Apache Kafka as that is out of scope for this article.
Kafka has a dependency on Java. Download and install the JDK and run the following command to verify that Java was installed.
Warning: some AWS services may have fees associated with them.
$ java -version
1. Download Apache Kafka
Head on over to apache.org and download Apache Kafka to each machine that you would like to use in your cluster. If you are working from the command line, considering using curl to get the file using the output option.
$ curl https://apache.claz.org/kafka/2.6.0/kafka_2.13-2.6.0.tgz ––output kafka.tgz
2. Extract Apache Kafka
On each machine, pick a location on your file system to run Apache Kafka and extract it using your favorite extraction tool. Here, we will use tar.
$ tar -xzf kafka.tgz
Now that we have Kafka extracted, let’s cd into the directory to get started using Kafka.
$ cd kafka_2.13-2.6.0
3. Start the Apache Kafka Zookeeper
It’s important to know that the zookeeper and the broker are both included in the Apache Kafka package that we just extracted to each machine.
The first step in starting the Kafka environment is to start the Kafka zookeeper. Only perform this on one machine. This machine will become the Apache Kafka Zookeeper. All other machines will function as an Apache Kafka Broker.
$ bin/zookeeper-server-start.sh config/zookeeper.properties
This command starts the zookeeper, passing in the configuration data from the file config/zookeeper.properties.
Important, record the IP address or the FQDN of the zookeeper for later configuration.
5. Configure the Apache Kafka Broker(s)
Open a terminal to each broker and using a text editor, edit the server.properties file. In this example we will use nano.
$ nano config/server.properties
Edit the following properties.
broker.id: Must be an integer value that uniquely identifies the broker.
zookeeper.connect: This is the IP address and port used by the zookeeper. The zookeeper will use port 2181 by default. Don’t change this.
advertised.listeners: this is the location reported to the zookeeper for the broker. By default, the port used by the broker is 9092. Don’t change this. More about advertised.listeners.
4. Start the Apache Kafka Broker(s)
The next step is starting the broker(s). For each terminal that was used to configure a broker, run the following command.
$ bin/kafka-server-start.sh config/server.properties
This command starts each broker, passing in the configuration data from the file config/server.properties.
That’s it! You now have a distributed Apache Kafka development cluster. Note, the model environment that was created has the following hosts:
Broker 1: 192.168.1.42:9092
Broker 2: 192.168.1.43:9092
Test Apache Kafka Environment
So, now that I have a distributed Apache Kafka development cluster up and running, what can I do with it? First we will test it to make sure that everything is working. Luckily, Apache Kafka has a command line producer and consumer that let’s us easily test the installation.
1. Create an Apache Kafka Topic
Open a new terminal to one of the Apache Kafka Brokers that you built and run the following command in the directory that you installed Apache Kafka in.
$ bin/kafka-topics.sh ––create ––topic test-topic ––bootstrap-server 192.168.1.42:9092
This command creates the topic, “test-topic”, using one of the Apache Kafka brokers that was started in step 4. 9092 is the default port that the broker runs on.
2. Produce a message to the newly created Apache Kafka topic.
In that same directory, run the following command. This will open a REPL like environment. This command produces messages to Apache Kafka through one of the Apache Kafka brokers that was configured and started earlier.
$ bin/kafka-console-producer.sh ––topic test-topic ––bootstrap-server 192.168.1.43:9092
To produce a message to Apache Kafka, type in the terminal and hit enter. Each line that you enter will be sent to the Apache Kafka topic (test-topic) that you just created. Use Control + C to quit the REPL like environment.
> Apache Kafka test record 1
> Apache Kafka test record 2
> Apache Kafka test record 3
3. Consume the messages that were sent to the newly created topic.
Open a new terminal to one of the the Apache Kafka brokers and run the following command in the directory that you installed Apache Kafka in. This command continually consumes messages from the Apache Kafka topic, “test-topic” via one of the brokers that was started earlier (192.168.1.42:9092).
$ bin/kafka-console-consumer.sh ––topic test-topic ––from-beginning ––bootstrap-server 192.168.1.42:9092
You should see the following output.
Apache Kafka test record 1
Apache Kafka test record 2
Apache Kafka test record 3
That’s it. You have now built and tested an Apache Kafka development cluster.
Want to take this to the next level?
Create .NET applications that produce and consume from this Apache Kafka development cluster.