I’m truly shocked by how difficult this information is to gather up in 1 place. Maybe because AWS has their own version of Kafka functionality.
At any rate, after much reading and irritation I have it working. There is still some work to do securing Zookeeper and adding ACL’s to Kafka, but we’ll get there later.
Tip 1: Use an Elastic IP per kafka broker and give it a DNS entry. We’re using Route53 so that’s pretty easy.
Tip 2: Put a complete list of your Kafka brokers and their INTERNAL IP addresses in /etc/hosts on each broker to match their DNS Hostname
Tip 3: Edit the network settings on your brokers so the hostname is accurate w/ the DNS Entry
Tip 4: Bounce box after this to ensure it works properly
Tip 5: Do NOT use underscores in your hostname.
All of this is to ensure Zookeeper can figure out WTF is going on. It won’t let you tell it directly… 🙁
We have a 3 node cluster running across 3 AZ’s in our VPC. Single node kafka is a lot easier. The following settings for server.properties are what makes this work for us:
Make sure broker.id is set to a unique # for each broker
auto.leader.rebalance.enable=true — enables better shutdown/restart experiences
controlled.shutdown.enable=true — helps with this as well
listeners=SSL://__HOST__:9093 — This is for your specific broker. We disabled PLAINTEXT as an option.
advertised.listeners=SSL://__HOST__:9093 — One of these is deprecated.
host.name=__HOST__
advertised.host.name=__HOST__
We wanted to keep everything forever, so you have to set a really high value like log.retention.hours=2147483647
to make that work. For the SSL stuff:
ssl.keystore.location=/root/jks/kafka.server.keystore.jks
ssl.keystore.password=
ssl.key.password=
ssl.truststore.location=/root/jks/kafka.server.truststore.jks
ssl.truststore.password=
security.inter.broker.protocol=SSL
There is a whole lot to learn about generating sign CA Keys and keystores for Java. I don’t have the energy, so I’ll give you the link:
https://kafka.apache.org/documentation/#security
Kafka has “Rack Awareness” built in, so this is kinda cool. I use a bash script to start my brokers, which allows me to use a template and fill in the rack location using stuff that EC2 instances know about themselves.
In the template set broker.rack=’__AZ__’
and in your bash script do something like this:
AZ=$(curl -s https://169.254.169.254/latest/meta-data/placement/availability-zone)
cat /root/kafka_current/config/server.properties.tmp | sed -e “s/__ID__/${ID}/g;s/__HOST__/${HOST}/g;s/__AZ__/${AZ}/g” > /root/kafka_current/config/server.properties
I embed the Broker ID in the hostname so I can do stuff like:
ID=$(hostname | cut -f 1 -d. | cut -f 2 -d “b”)
HOST=$(hostname)
So my broker names are kb1.mydomain.com, kb2.mydomain.com, etc.
Most of the Kafka Monitoring and Management tools completely fail w/ SSL. 🙁
