I have a csv file that is not formatted in the correct way for AGE to load. I was on the task to transform it into a new one so that AGE could read it and create nodes, like it is specified in the documentation. For that, I created a python script that creates a new file, connects to postgres, and performs the queries. I though this could be useful since if someone had csv files and wanted to create nodes and edges and send it to AGE, but it was not in the specified format, this could be used to quickly solve the problem.
Here is the old csv file (ProductsData.csv), it contains the data of products that have been purchased by other users (identified by their user_id
), the store where the product was purchased from (identified by their store_id
), and also the product_id
, which is the id
of the node:
product_name,price,description,store_id,user_id,product_id
iPhone 12,999,"Apple iPhone 12 - 64GB, Space Gray",1234,1001,123
Samsung Galaxy S21,899,"Samsung Galaxy S21 - 128GB, Phantom Black",5678,1002,124
AirPods Pro,249,"Apple AirPods Pro with Active Noise Cancellation",1234,1003,125
Sony PlayStation 5,499,"Sony PlayStation 5 Gaming Console, 1TB",9012,1004,126
Here is the Python file:
import psycopg2
import age
import csv
def read_csv(csv_file):
with open(csv_file, 'r') as file:
reader = csv.reader(file)
rows = list(reader)
return rows
def create_csv(csv_file):
new_header = ['id', 'product_name', 'description', 'price', 'store_id', 'user_id']
property_order = [5, 0, 2, 1, 3, 4] # Reorder the properties accordingly.
rows = read_csv(csv_file)
new_csv_file = 'products.csv'
with open(new_csv_file, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(new_header)
# Write each row with reordered properties.
for row in rows[1:]:
new_row = [row[i] for i in property_order]
writer.writerow(new_row)
print(f"New CSV file '{new_csv_file}' has been created with the desired format.")
def load_csv_nodes(csv_file, graph_name, conn):
with conn.cursor() as cursor:
try :
cursor.execute("""LOAD 'age';""")
cursor.execute("""SET search_path = ag_catalog, "$user", public;""")
cursor.execute("""SELECT load_labels_from_file(%s, 'Node', %s)""", (graph_name, csv_file,) )
conn.commit()
except Exception as ex:
print(type(ex), ex)
conn.rollback()
def main():
csv_file = 'ProductsData.csv'
create_csv(csv_file)
new_csv_file = 'products.csv'
GRAPH_NAME = 'csv_test_graph'
conn = psycopg2.connect(host="localhost", port="5432", dbname="database", user="user", password="password")
age.setUpAge(conn, GRAPH_NAME)
path_to_csv = '/path/to/folder/' + new_csv_file
load_csv_nodes(path_to_csv, GRAPH_NAME, conn)
main()
The generated file:
id,product_name,description,price,store_id,user_id
123,iPhone 12,"Apple iPhone 12 - 64GB, Space Gray",999,1234,1001
124,Samsung Galaxy S21,"Samsung Galaxy S21 - 128GB, Phantom Black",899,5678,1002
125,AirPods Pro,Apple AirPods Pro with Active Noise Cancellation,249,1234,1003
126,Sony PlayStation 5,"Sony PlayStation 5 Gaming Console, 1TB",499,9012,1004
But then, when running the script, it shows the following message:
<class 'psycopg2.errors.InvalidParameterValue'> label_id must be 1 .. 65535
The ids are set between 1 and 65535, and I don’t understand why this error message is showing.
2
Answers
For how to use
load_labels_from_file
please refer to the regress testing file. It shows how to use all the commands.You first need to create
Node
vlabel before callingload_labels_from_file
using the following command:Then run the script as it is.
That’s line is not properly written, you need to fix it with the correct path