skip to Main Content

I have a json file that defines the nodes and their relationships. It looks sometihng like this:

{"p":{"type":"node","id":"0","labels":["Paintings"],"properties":{"date":"1659-01-01T00:00:00","img":"removed-for-brevity(RFB)","name":"King Caspar","sitelink":"1","description":"RFB","exhibit":"RAB","uri":"RFB"}},"r":{"id":"144","type":"relationship","label":"on_MATERIAL","start":{"id":"0","labels":["Paintings"]},"end":{"id":"2504","labels":["Material"]}},"n":{"type":"node","id":"2504","labels":["Material"],"properties":{"name":"oak","sitelink":5,"description":"RFB","uri":"RFB"}}}

"p" is the first node, "r" is the relationship, "n" is the second node.

Is it possible for neo4j to create a graph/map automatically from this json file, without having to define the nodes and relationships through cypher manually?

I am fairly new to neo4j, I tried following the examples given on the Load JSON page, but it defines the nodes and their relationships manually, which i want to avoid.

2

Answers


  1. Chosen as BEST ANSWER

    It looks like neo4j can't automatically create a graph data model using a json file (as @cybersam pointed out earlier).

    I ended up writing a Python script to do this for me. Posting this here just in case it helps someone. It does the job for me!

    from neo4j import GraphDatabase
    import json
    
    # Connect to Neo4j
    uri = "bolt://localhost:7687"
    username = "_username_"
    password = "_password_"
    
    driver = GraphDatabase.driver(uri, auth=(username, password))
    
    processed_painting_ids = set() #mainting a set to track unique painting node IDs
    processed_node_ids = set()
    
    # Load JSON data from file
    with open("data_json.json", "r") as file:
        for line in file:
            json_data = json.loads(line)
    
            p_data = json_data["p"]
            r_data = json_data["r"]
            n_data = json_data["n"]
    
            p_unique_id = p_data.get("id") #keeps track of the id of the "p" node. 
    
            # Handle missing values in the data
            p_id = str(p_data["id"])
            p_date = str(p_data["properties"].get("date", "Unknown date"))
            p_img = p_data["properties"].get("img", "Unknown img")
            p_name = p_data["properties"].get("name", "Unknown name")
            p_sitelink = str(p_data["properties"].get("sitelink", "Unknown sitelink"))
            p_description = p_data["properties"].get("description", "Unknown description")
            p_exhibit = p_data["properties"].get("exhibit", "Unknown exhibit")
            p_uri = str(p_data["properties"].get("uri", "Unknown uri"))
    
            r_id = str(r_data["id"])
            r_label = r_data["label"]
            start_id = str(r_data["start"]["id"])
            end_id = str(r_data["end"]["id"])
    
            n_id = str(n_data["id"])
            n_name = n_data["properties"].get("name", "Unknown name")
            n_sitelink = str(n_data["properties"].get("sitelink","Unknown sitelink"))
            n_description = n_data["properties"].get("description","Unknown description")
            n_uri = n_data["properties"].get("uri","Unknown uri")
    
            with driver.session() as session:
        
                # Create the "n" material node
                if n_id not in processed_node_ids:
                    session.run("CREATE (n:" + n_data["labels"][0] + " {id: " + n_id + ", name: "" + n_name + "", sitelink: "" + n_sitelink + "", description: "" + n_description + "", uri: "" + uri + ""})")
                    processed_node_ids.add(n_id)
                # check if the "p" node is repititive
                if p_unique_id not in processed_painting_ids:
                    # Create the "p" node
                    session.run("CREATE (p:" + p_data["labels"][0] + "{id: "+p_id+",date: ""+p_date+"", img: ""+p_img+"", name: ""+p_name+"", sitelink: " + p_sitelink+", description: ""+p_description+"", exhibit: ""+p_exhibit+"", uri: ""+p_uri + ""})") 
                    # Add id of the node to the set
                    processed_painting_ids.add(p_unique_id)
                # Create the "r" relationship
                session.run("MATCH (start), (end) WHERE start.id = "+start_id+" AND end.id = "+end_id+" CREATE (start)-[r:"+r_label+" {id: "+r_id+"}]->(end)")
    

  2. No, there is no automated way, and even if there were the generated result could be suboptimal or even wrong for your use cases.

    You need to design the graph data model (node labels, relationship types, etc.) yourself. There are many considerations (like your use cases, and the necessary indexes and constraints) that are not revealed by a simple JSON data dump. Also, you need to understand the schema of the JSON and determine how to map that to your data model.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search