skip to Main Content

I have set up three containers that are networked because I would like to use Hadoop and Hive with PostgreSQL. You can access Docker set up via https://github.com/jcool12/hadoop-docker/tree/main/hivepost so you can download the folders/ files to run it. The Hadoop container starts up okay, the PostgreSQL start up okay, but the Hive container, upon startup, presents these errors:

Waiting for PostgreSQL to start...
2024-05-24 00:09:28 psql: error: could not connect to server: Connection refused
2024-05-24 00:09:28     Is the server running on host "postgres" (172.22.0.2) and accepting
2024-05-24 00:09:28     TCP/IP connections on port 5432?
2024-05-24 00:09:28 Postgres is unavailable - sleeping
2024-05-24 00:09:29 psql: error: could not connect to server: Connection refused
2024-05-24 00:09:29     Is the server running on host "postgres" (172.22.0.2) and accepting
2024-05-24 00:09:29     TCP/IP connections on port 5432?
2024-05-24 00:09:29 Postgres is unavailable - sleeping
2024-05-24 00:09:30 psql: error: could not connect to server: Connection refused
2024-05-24 00:09:30     Is the server running on host "postgres" (172.22.0.2) and accepting
2024-05-24 00:09:30     TCP/IP connections on port 5432?
2024-05-24 00:09:30 Postgres is unavailable - sleeping
2024-05-24 00:09:31 psql: error: could not connect to server: Connection refused
2024-05-24 00:09:31     Is the server running on host "postgres" (172.22.0.2) and accepting
2024-05-24 00:09:31     TCP/IP connections on port 5432?
2024-05-24 00:09:31 Postgres is unavailable - sleeping
2024-05-24 00:09:32 Postgres is up - checking for required tables
2024-05-24 00:09:32 Required tables not found. Proceeding with initialization...
2024-05-24 00:09:32 WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:42 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:28 Waiting for PostgreSQL to start at postgres...
2024-05-24 00:09:39 Initializing the schema to: 4.0.0
2024-05-24 00:09:39 Metastore connection URL:    jdbc:postgresql://postgres:5432/hive
2024-05-24 00:09:39 Metastore connection Driver :        org.postgresql.Driver
2024-05-24 00:09:39 Metastore connection User:   hiveuser
2024-05-24 00:09:39 Starting metastore schema initialization to 4.0.0
2024-05-24 00:09:39 Initialization script hive-schema-4.0.0.postgres.sql
2024-05-24 00:09:50 Initialization script completed
2024-05-24 00:09:51 Initializing Hive schema...
2024-05-24 00:09:56 Initializing the schema to: 4.0.0
2024-05-24 00:09:56 Metastore connection URL:    jdbc:postgresql://postgres:5432/hive
2024-05-24 00:09:56 Metastore connection Driver :        org.postgresql.Driver
2024-05-24 00:09:56 Metastore connection User:   hiveuser
2024-05-24 00:09:57 Starting metastore schema initialization to 4.0.0
2024-05-24 00:09:57 Initialization script hive-schema-4.0.0.postgres.sql
2024-05-24 00:10:01 2024-05-23 23:10:01: Starting Hive Metastore Server
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:43 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:44 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:45 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:46 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:47 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:48 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:49 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 
2024-05-24 00:09:50 Postgres is up - executing command
2024-05-24 00:09:51 Password for user hiveuser: 
2024-05-24 00:09:51 psql: error: fe_sendauth: no password supplied
2024-05-24 00:09:51 WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 
2024-05-24 00:10:00 Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
2024-05-24 00:10:00 Schema initialization FAILED! Metastore state would be inconsistent!
2024-05-24 00:10:00 Underlying cause: java.io.IOException : Schema script failed, errorcode 2
2024-05-24 00:10:00 Use --verbose for detailed stacktrace.
2024-05-24 00:10:00 *** schemaTool failed ***
2024-05-24 00:10:01 WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
2024-05-24 00:10:01 WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
2024-05-24 00:10:13 2024-05-23 23:10:13: Starting HiveServer2
2024-05-24 00:10:13 WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
2024-05-24 00:10:24 Hive Session ID = 854876ef-81d3-409f-af2d-223ec90b215f
2024-05-24 00:11:34 Hive Session ID = b46b34ab-c27e-4898-afaa-71d79fc03f44
2024-05-24 00:12:34 Hive Session ID = 85678875-8418-45fc-aeb6-c75bce3b926e

Could you please help me resolve these issues so hive function correctly with postgres?

UPDATE (24/05/24):

@datawookie, thank you this addresses the errors encountered during Hive startup, but there still seems to be an issue with Hive. Let me elaborate:
• After connecting to the Hive container using winpty docker exec -it hive bash, I launch Hive by entering "hive" as shown below. However, upon executing show databases, I receive a "No current connection" message:

root@6fbd83a5ca0f:/# hive
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Beeline version 4.0.0 by Apache Hive
beeline> show databases;
No current connection

• Executing netstat -tuln | grep 10000 yields no output.
• Furthermore, attempting to connect using beeline -u jdbc:hive2://localhost:10000 results in the following error:

root@bf028412c4e7:/# beeline -u jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
24/05/24 10:34:52 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status. Enable verbose error messages (--verbose=true) for more information.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

Despite placing the Hadoop conf files into the Hive folder and ensuring they were copied into the Hadoop Conf Directory to resolve the warning error, the aforementioned errors persist. My experience is limited, not sure what else I could do to resolve the issue. Could you please help resolve the issue?

2

Answers


  1. I suggest some updates to the entrypoint.sh file.

    #!/bin/bash
    
    rm /opt/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar
    
    if [ ! -f $HADOOP_CONF_DIR/log4j.properties ]; then
      cp $HADOOP_HOME/conf/log4j.properties $HADOOP_CONF_DIR/log4j.properties
    fi
    
    if [ ! -f $HIVE_HOME/conf/log4j.properties ]; then
      cp $HIVE_HOME/conf/log4j-hive.properties $HIVE_HOME/conf/log4j.properties
    fi
    
    export PGPASSWORD=$POSTGRES_PASSWORD
    TABLE_EXISTS=$(psql -h postgres -U $POSTGRES_USER -d $POSTGRES_DB -tc "SELECT 1 FROM pg_tables WHERE schemaname='public' AND tablename='BUCKETING_COLS';")
    
    if [ -z "$TABLE_EXISTS" ]; then
      echo "Initializing Hive schema"
      $HIVE_HOME/bin/schematool -initSchema -dbType postgres 2>&1 | grep -v '^$'
    else
      echo "Hive schema is already initialized, skipping schema initialization"
    fi
    
    pid=$(pgrep -f 'hive.*service.*hiveserver2')
    if [ -n "$pid" ]; then
      echo "Stopping running HiveServer2 process..."
      kill -9 $pid
    fi
    
    $HIVE_HOME/bin/hive --service metastore &
    
    $HIVE_HOME/bin/hive --service hiveserver2
    
    1. Check if the BUCKETING_COLS table exists and use this to determine whether or not to initialise the Hive schema.
    2. Add a grep at the end of the schematool command to suppress the furious flurry of empty lines in the logs.
    3. Remove the execution of wait-for-postgres.sh and move this into the docker-compose.yml file (see below). This is not really necessary but I find that it makes things clearer if this dependency is made clear in the Docker Compose configuration.

    The docker-compose.yml below has been simplified to focus on the key changes (adding a health check for the postgres service). See 🚨 comments.

    version: '3.8'
    
    services:
      postgres:
        image: postgres:13
        container_name: postgres
        environment:
          POSTGRES_HOST: postgres
          POSTGRES_DB: hive
          POSTGRES_USER: hiveuser
          POSTGRES_PASSWORD: hivepassword
        # 🚨 Check that PostgreSQL is ready for connections before marking as "healthy".
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
          interval: 5s
          timeout: 5s
          retries: 5
    
      hive:
        build: ./hive  # Assumes you have a Dockerfile in the ./hive directory
        container_name: hive
        environment:
          POSTGRES_JDBC_VERSION: 42.2.24
          POSTGRES_HOST: postgres  # Ensure PostgreSQL host is set
          POSTGRES_DB: hive
          POSTGRES_USER: hiveuser
          POSTGRES_PASSWORD: hivepassword
        depends_on:
          # 🚨 Only start this container when PostgreSQL is "healthy".
          postgres:
            condition: service_healthy
          hadoop:
            condition: service_started
    
    Login or Signup to reply.
  2. Connect to the Hive container.

    docker exec -it hive /bin/bash
    

    Then connect in Beeline.

    enter image description here

    1. Run hive.
    2. In Beeline run !connect jdbc:hive2://localhost:10000.
    3. Provide the username and password specified in docker-compose.yml.
    4. You can then run show databases and show tables.

    To be clear, this is a separate issue to the one in the original question. Ideally on Stack Overflow each question should address one issue.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search