skip to Main Content

I’m trying to set up a docker stack for a datascience project and I want to use redis to have services exchange data.

I followed the documentation provided by label studio but there are a lot of details missing and my implementation doesn’t work.

Specifically : label studio is able to register redis as a data source but not as a data target, and as a source it doesn’t retrieve my tasks data.

what I tried

My Docker compose file

I removed any service unrelated to label-studio, and there is a .env file for the variables.
The postgres part works fine but I kept it in the example because its part of redis config.

services:
  postgres:
    image: postgres:16-alpine
    container_name: postgres
    ports:
      - ${POSTGRES_PORT}:5432
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - PGDATA=/var/lib/postgresql/data/pgdata
      - POSTGRES_PORT=${POSTGRES_PORT}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready", "-d", "postgres"]
      interval: 10s
      timeout: 10s
      retries: 120
    volumes:
      - pgdata:/var/lib/postgresql/data:Z

  redis:
    image: redis:5-alpine
    container_name: redis
    ports:
      - 6379:6379
    volumes:
      - redisdata:/data
    healthcheck:
      test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]
      interval: 10s
      timeout: 10s
      retries: 120
    command: [ "redis-server",
               "--save", "60", "1",
               "--loglevel", "debug",
               "--requirepass", "${REDIS_PASSWORD}"]

  label-studio:
    image: heartexlabs/label-studio:latest
    container_name: label-studio
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    ports:
      - 8081:8080
    environment:
      - DJANGO_DB=default
      - POSTGRE_HOST=postgres
      - POSTGRE_PORT=${POSTGRE_PORT}
      - POSTGRE_NAME=${POSTGRE_NAME}
      - POSTGRE_USER=${POSTGRE_USER}
      - POSTGRE_PASSWORD=${POSTGRE_PASSWORD}
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - REDIS_LOCATION=redis:6379
      - REDIS_DB=0
      - REDIS_PASSWORD=${REDIS_PASSWORD}
    volumes:
      - lsdata:/label-studio/data
    command: ["label-studio",
              "--log-level", "DEBUG"]

volumes:
  pgdata:
    driver: local
  redisdata:
    driver: local
  lsdata:
    driver: local

Redis

Redis runs, and has tasks data, I tested the following formats

labelstudio:ls-task-1 '{"text":"some text"}'
labelstudio:ls-task-2 '{"id":0, "data": {"texte": "some text"}} 


ls-task-1 '{"text":"some text"}'
ls-task-2 '{"id":0, "data": {"texte": "some text"}} 

Label Studio

labeling interface config

<View>
  <Text name="text" value="$text"/>
  <View style="box-shadow: 2px 2px 5px #999; padding: 20px; margin-top: 2em; border-radius: 5px;">
    <Header value="Some themes"/>
    <Choices name="theme" toName="text" choice="multiple" showInLine="true">
      <Choice value="somevalue">Some Choice</Choice>
      <Choice value="othervalue">Other Choice/Choice>
    </Choices>
  </View>
</View>
<!-- {
  "data": {"text": "Some Text"}
} -->

Cloud Storage config

Storage Type : Redis
Path : labelstudio
Password :
Host : redis
port : 6379

How it fails

As a data source

I see in the logs label studio connecting to redis but it always shows 0 tasks

label studio logs

[2024-07-11 08:46:24,606] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2
[2024-07-11 08:46:43,954] [io_storages.base_models::sync::454] [INFO] Start syncing storage RedisImportStorage object (1)
[2024-07-11 08:46:43,964] [projects.models::_update_tasks_states::422] [INFO] Starting _update_tasks_states with params: Project risque-juridique (id=1) maximum_annotations 1 and percentage 100
[2024-07-11 08:46:43,971] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:44,454] [urllib3.connectionpool::_make_request::474] [DEBUG] https://tele.labelstud.io:443 "POST / HTTP/1.1" 200 0
[2024-07-11 08:47:24,609] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2

redis logs

11 Jul 2024 08:46:43.953 - Accepted 192.168.48.6:42558
11 Jul 2024 08:46:43.954 - Client closed connection
11 Jul 2024 08:46:43.963 - Accepted 192.168.48.6:42564
11 Jul 2024 08:46:43.964 - Client closed connection
11 Jul 2024 08:46:45.526 - Accepted 127.0.0.1:57774
11 Jul 2024 08:46:45.526 - Client closed connection

As a data target

Label studio shows

Runtime error
Validation error

    validate_connection is not implemented

Version: 1.12.1
Label studio logs
[2024-07-11 08:51:29,742] [core.utils.common::custom_exception_handler::89] [ERROR] c9a7909a-c865-4f8a-813b-3a4e7918d9a5 [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
Traceback (most recent call last):
  File "/label-studio/label_studio/io_storages/api.py", line 82, in perform_create
    instance.validate_connection()
  File "/label-studio/label_studio/io_storages/base_models.py", line 218, in validate_connection
    raise NotImplementedError('validate_connection is not implemented')
NotImplementedError: validate_connection is not implemented

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/django/utils/decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/generics.py", line 242, in post
    return self.create(request, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/mixins.py", line 19, in create
    self.perform_create(serializer)
  File "/label-studio/label_studio/io_storages/api.py", line 84, in perform_create
    raise ValidationError(exc)
rest_framework.exceptions.ValidationError: [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210

2

Answers


  1. Chosen as BEST ANSWER

    I'm gonna answer my own question, long story short, it was indeed a bug, not only was the validation function for redis output not written but the redis input link into label studio wasn't properly implemented either. The Label Studio team recently added to the redis connection config the missing parameters (redis database id for one) now allowing retrieval of tasks from redis.


  2. It looks like, the devs have not yet written the validation function yet. Or they may have raised exception in the function intentionally. But there is no comment or explanation. If you can, please follow the following path in the Label Studio code repo, and see for your version if there is function definition present or not. If not, you can try removing the exception raise, or make a stub definition and see if redis is connected (i.e imported/exported as data source).

    But I think stubbing the method won’t work, as in the model class they have left bunch of other functions without definition too.

    You can try suppressing all the exception raises in the class methods, but I do not think it is going to work. I did not try redis as data I/O source myself so I am not sure how you would implement the method definitions.

    The file throwing exception is at following path:

    • {Label Studio root}/label_studio/io_storages/base_models.py

    Here is the code on line 217, I am referring to:
    enter image description here

    PS: I am writing an answer as I had to provide an image as a support document. And the explanation was really too long and would have to be split across multiple comments.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search