skip to Main Content

this is my setup:
I’m using Django and calling julia from an django rq-worker. Using a worker avoids threading problems, because there are no other threads.

In julia I’m using multiprocessing for calculations some fancy technologie wodooo. So long all things are fine.

If I start django and the worker, I can calculate one time. All things are fine. But the second time, with different data, I get this error.

┌ Error: Error adding value to column :t.
└ @ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:1644

After this, the calculation runs to the end and then I get a real long error message with a stack race, but there is no point, where I can catch the problem.

Restarting django and the worker does not clear the problem. I have to delete the "mymodule.pyc" and restart, then the calculations runs again…. once… and the second time the error appears again.

What’s the different data? I have a pool of pieces on which I calculate something. let’s call them a,b,c,d,….

So If I run the calc for abc, its ok. the second time for abc, it’s okay too. But If I take cde it throws the error.
But cde is not the problem. If I run cde as first calc it works and crashes while running abc. I hope it’s not too confusing.

How do I use julia multithreading:

import os
from multiprocessing import cpu_count
# read num of cpus and set the julia threas var
os.environ["JULIA_NUM_THREADS"] = str(cpu_count())

#import (py)julia
from julia import Main as jl

#do simething
jl.eval('some code')
jl.include("Main.jl")

What versions do I use:
Debian 10.7
python 3.7.9
pyjulia 0.5.6
julia 1.6.1
DataFrames: 1.1.1 (0.21.8 wasn’t working, too)

2

Answers


  1. Chosen as BEST ANSWER

    Well, this error originates to julia, not to pyjulia.

    It's just a normal julia behavior, because some things are not threadsafe.

    https://github.com/JuliaData/DataFrames.jl/issues/2795

    Solving this problem direct in the julia code, eliminates the issues in pyjulia and pyc files for sure.

    A workaround for this problem is, to fill the Dataframe with missing values before running the parallelized code. Then, do not use push!. Instead you can replace the missing values by the results you want to push.


  2. This does not seem to be threading related.
    You simply have data types mixed somewhere in your code. See this example:

    julia> using DataFrames
    
    julia> df = DataFrame(A = String[], B = Int[])
    0×2 DataFrame
    
    julia> push!(df, ("hello", 1))
    1×2 DataFrame
     Row │ A       B
         │ String  Int64
    ─────┼───────────────
       1 │ hello       1
    
    julia> push!(df, (1, "hello"))
    ┌ Error: Error adding value to column :A.
    

    If due to some reason you are totally unable to find the error you could try extending types in your data frame such as:

    julia> df.A = Vector{Any}(df.A); 
    
    julia> df.B = Vector{Any}(df.B);
    
    julia> push!(df, (1, "hello"))
    2×2 DataFrame
     Row │ A      B
         │ Any    Any
    ─────┼──────────────
       1 │ hello  1
       2 │ 1      hello
    

    This gives you chance to see how the data gets added to your DataFrame

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search