I am very new to docker and Kafka, and have a simple kafka python publisher shown below
The following are in my dockerfile:
FROM python:3.10
WORKDIR /app
COPY . /app
RUN pip install --user pip==23.0.1 && pip install pipenv && pipenv install --system
ENV ENVIRONMENT=production
CMD ["python3", "src/producer.py"]
as well as my yaml file for compose:
version: '3'
services:
zookeeper:
image: wurstmeister/zookeeper
container_name: zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
publisher:
container_name: publisher
build:
context: .
dockerfile: Dockerfile
depends_on:
- kafka
- zookeeper
in producer.py
I have:.
import json
from kafka import KafkaProducer,
sleep(10)
producer = KafkaProducer(bootstrap_servers=['kafka:9092'], api_version=(0, 10))
json_path = "/data"
with open(json_path, 'r') as f:
data = json.load(f)
print('Ready to publish')
for record in data:
producer.send(topic, json.dumps(record).encode('utf-8'))
print('Published message !!!')
producer.flush()
I first execute:
sudo docker-compose -f docker-compose.yml up -d
and then:
sudo docker run publisher
when I ctrl+c, I see the trackback printed, and only the line for print('Ready to publish')
rather than ‘print(‘Published message !!!’)’
Howeve, when I simply do
python3 src/producer.py
it prints both and publishes the messages without being stuck at the "send" method.
I have looked at a few other threads such as this one, but none really helped finding the mistake.
2
Answers
It looks like your Kafka procucer isn’t able to reach the Kafka broker in the Docker network.
Since you’ve got the Kafka service in your docker-compose, that will act as the hostname within the Docker network.
So it should just be a case of updating the
bootstrap_servers=['kafka:9092']
in yourproducer.py
with an environmental variable, so it can be used both within the Docker netowork, and when the script is run outside Docker (wpython3 src/producer.py
).Then, under
publisher
in yourdocker-compose.yml
add:Then, when running within the Docker network is will use the environmental variable (
KAFKA_BOOTSTRAP_SERVERS
), but when running directly it will use the default value (localhost:9092
)You have no JSON files at
/data
mounted, or otherwise copied into the Dockerfile (everything is at/app
. So, there’s no files to read and produce. Plus,open()
works on files, not directory pathsSecondly, you need to remove
KAFKA_ADVERTISED_HOST_NAME
and replace withKAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
, otherwise python app is stalling trying to connect to itself on port 9092, as the linked post and referenced blog indicatesAs the code has been written, that shouldn’t work.
kafka
is an Unknown Host unless you’ve manually edited your/etc/hosts
file… Which you shouldn’t