dask Questions

Postgresql – The most fast and efficient way to clean ~100GB csv from duplicates by one column

February 1, 2024
Repzz
2 Answers

I have ~100GB csv file with following columns: sex;name;dob;hash This files was created after some processing of another .csv file. And it can contain tuples, that's why there is this hash column. What I need is to delete duplicates from…

VIEW QUESTION

Debian – error of accessing an attribute of dask_cudf Series data structure when it is called from a user defined function

June 22, 2023
mtnt
2 Answers

My question is relevant to my previous one at Error of using parallelizing data processing by "sentence_transformers" on 2 GPUs from Jupyter notebook. I have tried a new solution because I got an error for the proposed one. I…

VIEW QUESTION

Dask Distributed: Limit Dask distributed worker to 1 CPU – Docker

March 13, 2023
SMI
2 Answers

My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them. I am deploying a scheduler container and worker containers using docker to…

VIEW QUESTION

Running dask map_partition functions in multiple workers – Docker

March 8, 2022
ps0604
2 Answers

I have a dask architecture implemented with five docker containers: a client, a scheduler, and three workers. I also have a large dask dataframe stored in parquet format in a docker volume. The dataframe was created with 3 partitions, so…

VIEW QUESTION