skip to Main Content

I have a SQL table and have made it a dataframe with Python.

Then I calculated new values for a new column.

As you know, it’s easy to add new columns with values to a dataframe as long as the length of the dataframe and the added column is the same.

Now I can take the whole dataframe and make a new MySQL table.

df.to_sql(name=key+".1", con=engine , if_exists ='fail', index=False)

But it seems a bit too complicated to me to do it this way.

So I tried to insert the values into the new table by using

# for index,x in enumerate(df["Time"]):
        #     try:
        #         mycursor = mydb.cursor()
        #         mycursor.execute("UPDATE "+key +" SET DhE="+str(df["DhE"][index])+" WHERE Time='"+str(x)+"'")
        #         mydb.commit()
        #     except:
        #         print("UPDATE "+key+" SET DhE="+str(df["DhE"][index])+" WHERE Time="+str(x) +"****error ")

        

This is not usable.

  1. The "WHERE" clause compares the datatype Date() as the primary key of the table, which was a bad choice.
    Don´t do this, this is a unique key but not usable in practice.

  2. By default, a where clause is way too slow for a table length of 731911 to add values for a whole column.

My conclusion and my question are
Do I have to write a whole new Table into my database, or is there a way to just add the Column (with values!), which has the same length as the table, to the MySQL Table ?

for example, using

INSER INTO XY.new_Column VALUES "whole new column values"

2

Answers


  1. Chosen as BEST ANSWER

    Thank you very much for your answer.

    I got a warning,

    that VALUES function' is deprecated and will be removed in a future release. Please use an alias (INSERT INTO ... VALUES (...) AS alias) and replace VALUES(col) in the ON DUPLICATE KEY UPDATE clause with alias.col instead .

    I changed it to:

    query = "INSERT INTO your_table (Time, new_column_name) VALUES (%s, %s) AS x DUPLICATE KEY UPDATE new_column_name=x.new_column_name)"
    

    In addition, the "time" column has to be the primary key so I set

    ALTER your_table
    MODIFY COLUMN TIME datetime NOT NULL 
    

    and

    ALTER TABLE your_table ADD PRIMARY KEY (Time)
    

    After that, sql is able to get an key error and DUPLICATE KEY UPDATE can do its magic.

    Now I will go to benchmark this against adding the whole new table :)


  2. The way you are inserting values into the MySQL table is very inefficient. It iterates through each row and updates it one by one. This results in a very high number of individual queries to the database, which is extremely slow.

    1. Connect to your database.
    2. Use ALTER TABLE to add the new column.
    3. Bulk insert the values using INSERT INTO ... ON DUPLICATE KEY UPDATE.
    4. Commit the changes.

    In Python, this should work.

    import pymysql
    
    # Connect to database
    connection = pymysql.connect(host='localhost', user='username', password='password', database='mydb')
    cursor = connection.cursor()
    
    # Add new column
    cursor.execute("ALTER TABLE your_table ADD new_column_name datatype")
    
    # Bulk insert values (assume values_list contains (time, value) tuples)
    query = "INSERT INTO your_table (Time, new_column_name) VALUES (%s, %s) ON DUPLICATE KEY UPDATE new_column_name=VALUES(new_column_name)"
    cursor.executemany(query, values_list)
    
    # Commit and close
    connection.commit()
    cursor.close()
    connection.close()
    

    Replace your_table with your table name, new_column_name with the column name, and datatype with the column’s datatype. Also, have values_list as a list of tuples with the time and value for the new column.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search