I’m trying to set up a docker stack for a datascience project and I want to use redis to have services exchange data.
I followed the documentation provided by label studio but there are a lot of details missing and my implementation doesn’t work.
Specifically : label studio is able to register redis as a data source but not as a data target, and as a source it doesn’t retrieve my tasks data.
what I tried
My Docker compose file
I removed any service unrelated to label-studio, and there is a .env file for the variables.
The postgres part works fine but I kept it in the example because its part of redis config.
services:
postgres:
image: postgres:16-alpine
container_name: postgres
ports:
- ${POSTGRES_PORT}:5432
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- PGDATA=/var/lib/postgresql/data/pgdata
- POSTGRES_PORT=${POSTGRES_PORT}
healthcheck:
test: ["CMD-SHELL", "pg_isready", "-d", "postgres"]
interval: 10s
timeout: 10s
retries: 120
volumes:
- pgdata:/var/lib/postgresql/data:Z
redis:
image: redis:5-alpine
container_name: redis
ports:
- 6379:6379
volumes:
- redisdata:/data
healthcheck:
test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]
interval: 10s
timeout: 10s
retries: 120
command: [ "redis-server",
"--save", "60", "1",
"--loglevel", "debug",
"--requirepass", "${REDIS_PASSWORD}"]
label-studio:
image: heartexlabs/label-studio:latest
container_name: label-studio
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
ports:
- 8081:8080
environment:
- DJANGO_DB=default
- POSTGRE_HOST=postgres
- POSTGRE_PORT=${POSTGRE_PORT}
- POSTGRE_NAME=${POSTGRE_NAME}
- POSTGRE_USER=${POSTGRE_USER}
- POSTGRE_PASSWORD=${POSTGRE_PASSWORD}
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_LOCATION=redis:6379
- REDIS_DB=0
- REDIS_PASSWORD=${REDIS_PASSWORD}
volumes:
- lsdata:/label-studio/data
command: ["label-studio",
"--log-level", "DEBUG"]
volumes:
pgdata:
driver: local
redisdata:
driver: local
lsdata:
driver: local
Redis
Redis runs, and has tasks data, I tested the following formats
labelstudio:ls-task-1 '{"text":"some text"}'
labelstudio:ls-task-2 '{"id":0, "data": {"texte": "some text"}}
ls-task-1 '{"text":"some text"}'
ls-task-2 '{"id":0, "data": {"texte": "some text"}}
Label Studio
labeling interface config
<View>
<Text name="text" value="$text"/>
<View style="box-shadow: 2px 2px 5px #999; padding: 20px; margin-top: 2em; border-radius: 5px;">
<Header value="Some themes"/>
<Choices name="theme" toName="text" choice="multiple" showInLine="true">
<Choice value="somevalue">Some Choice</Choice>
<Choice value="othervalue">Other Choice/Choice>
</Choices>
</View>
</View>
<!-- {
"data": {"text": "Some Text"}
} -->
Cloud Storage config
Storage Type : Redis
Path : labelstudio
Password :
Host : redis
port : 6379
How it fails
As a data source
I see in the logs label studio connecting to redis but it always shows 0 tasks
label studio logs
[2024-07-11 08:46:24,606] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2
[2024-07-11 08:46:43,954] [io_storages.base_models::sync::454] [INFO] Start syncing storage RedisImportStorage object (1)
[2024-07-11 08:46:43,964] [projects.models::_update_tasks_states::422] [INFO] Starting _update_tasks_states with params: Project risque-juridique (id=1) maximum_annotations 1 and percentage 100
[2024-07-11 08:46:43,971] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:44,454] [urllib3.connectionpool::_make_request::474] [DEBUG] https://tele.labelstud.io:443 "POST / HTTP/1.1" 200 0
[2024-07-11 08:47:24,609] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2
redis logs
11 Jul 2024 08:46:43.953 - Accepted 192.168.48.6:42558
11 Jul 2024 08:46:43.954 - Client closed connection
11 Jul 2024 08:46:43.963 - Accepted 192.168.48.6:42564
11 Jul 2024 08:46:43.964 - Client closed connection
11 Jul 2024 08:46:45.526 - Accepted 127.0.0.1:57774
11 Jul 2024 08:46:45.526 - Client closed connection
As a data target
Label studio shows
Runtime error
Validation error
validate_connection is not implemented
Version: 1.12.1
Label studio logs
[2024-07-11 08:51:29,742] [core.utils.common::custom_exception_handler::89] [ERROR] c9a7909a-c865-4f8a-813b-3a4e7918d9a5 [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
Traceback (most recent call last):
File "/label-studio/label_studio/io_storages/api.py", line 82, in perform_create
instance.validate_connection()
File "/label-studio/label_studio/io_storages/base_models.py", line 218, in validate_connection
raise NotImplementedError('validate_connection is not implemented')
NotImplementedError: validate_connection is not implemented
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/django/utils/decorators.py", line 43, in _wrapper
return bound_method(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/rest_framework/generics.py", line 242, in post
return self.create(request, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/rest_framework/mixins.py", line 19, in create
self.perform_create(serializer)
File "/label-studio/label_studio/io_storages/api.py", line 84, in perform_create
raise ValidationError(exc)
rest_framework.exceptions.ValidationError: [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210
2
Answers
I'm gonna answer my own question, long story short, it was indeed a bug, not only was the validation function for redis output not written but the redis input link into label studio wasn't properly implemented either. The Label Studio team recently added to the redis connection config the missing parameters (redis database id for one) now allowing retrieval of tasks from redis.
It looks like, the devs have not yet written the validation function yet. Or they may have raised exception in the function intentionally. But there is no comment or explanation. If you can, please follow the following path in the Label Studio code repo, and see for your version if there is function definition present or not. If not, you can try removing the exception raise, or make a stub definition and see if
redis
is connected (i.e imported/exported as data source).But I think stubbing the method won’t work, as in the
model
class they have left bunch of other functions without definition too.You can try suppressing all the exception raises in the class methods, but I do not think it is going to work. I did not try
redis
as data I/O source myself so I am not sure how you would implement the method definitions.The file throwing exception is at following path:
{Label Studio root}/label_studio/io_storages/base_models.py
Here is the code on line
217
, I am referring to:PS: I am writing an answer as I had to provide an image as a support document. And the explanation was really too long and would have to be split across multiple comments.