skip to Main Content

I am very new to docker and Kafka, and have a simple kafka python publisher shown below

The following are in my dockerfile:

FROM python:3.10

WORKDIR /app

COPY . /app

RUN pip install --user pip==23.0.1 && pip install pipenv && pipenv install --system

ENV ENVIRONMENT=production
CMD ["python3", "src/producer.py"]

as well as my yaml file for compose:

version: '3'

services:
  zookeeper:
    image: wurstmeister/zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"

  kafka:
    image: wurstmeister/kafka
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_HOST_NAME: localhost
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

  publisher:
    container_name: publisher
    build:
      context: .
      dockerfile: Dockerfile
    depends_on:
      - kafka
      - zookeeper

in producer.py I have:.

import json 
from kafka import KafkaProducer,
sleep(10)

producer = KafkaProducer(bootstrap_servers=['kafka:9092'], api_version=(0, 10))
json_path = "/data"
with open(json_path, 'r') as f:
    data = json.load(f)
    print('Ready to publish')
    for record in data:
        producer.send(topic, json.dumps(record).encode('utf-8'))
        print('Published message !!!')

producer.flush()

I first execute:

sudo docker-compose -f docker-compose.yml up -d

and then:

sudo docker run publisher

when I ctrl+c, I see the trackback printed, and only the line for print('Ready to publish') rather than ‘print(‘Published message !!!’)’

Howeve, when I simply do

   python3 src/producer.py

it prints both and publishes the messages without being stuck at the "send" method.

I have looked at a few other threads such as this one, but none really helped finding the mistake.

2

Answers


  1. It looks like your Kafka procucer isn’t able to reach the Kafka broker in the Docker network.

    Since you’ve got the Kafka service in your docker-compose, that will act as the hostname within the Docker network.

    So it should just be a case of updating the bootstrap_servers=['kafka:9092'] in your producer.py with an environmental variable, so it can be used both within the Docker netowork, and when the script is run outside Docker (w python3 src/producer.py).

    # Get env var
    kafka_bootstrap_servers = os.environ.get("KAFKA_BOOTSTRAP_SERVERS", "localhost:9092")
    
    # Use in producer
    producer = KafkaProducer(bootstrap_servers=kafka_bootstrap_servers, api_version=(0, 10))
    

    Then, under publisher in your docker-compose.yml add:

    environment:
       KAFKA_BOOTSTRAP_SERVERS: kafka:9092
    

    Then, when running within the Docker network is will use the environmental variable (KAFKA_BOOTSTRAP_SERVERS), but when running directly it will use the default value (localhost:9092)

    Login or Signup to reply.
  2. You have no JSON files at /data mounted, or otherwise copied into the Dockerfile (everything is at /app. So, there’s no files to read and produce. Plus, open() works on files, not directory paths

    Secondly, you need to remove KAFKA_ADVERTISED_HOST_NAME and replace with KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092, otherwise python app is stalling trying to connect to itself on port 9092, as the linked post and referenced blog indicates

    when I simply do…

    As the code has been written, that shouldn’t work. kafka is an Unknown Host unless you’ve manually edited your /etc/hosts file… Which you shouldn’t

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search